Methods for detection of donor-derived cell-free dna

ABSTRACT

The present disclosure provides methods for determining the status of an allograft within a transplant recipient from genotypic data measured from a mixed sample of DNA comprising DNA from both the transplant recipient and from the donor. The mixed sample of DNA may be preferentially enriched at a plurality of polymorphic loci in a way that minimizes the allelic bias, for example using massively multiplexed targeted PCR.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. Utility application Ser. No.17/165,592, filed Feb. 2, 2021. U.S. Utility application Ser. No.17/165,592 is a continuation of U.S. Utility application Ser. No.17/252,068, filed Dec. 14, 2020. U.S. Utility application Ser. No.17/252,068 is a National Stage Entry of PCT Application No.PCT/US2019/040603, filed Jul. 3, 2019. PCT Application No.PCT/US2019/040603 claims priority to U.S. Provisional Application No.62/693,833 filed Jul. 3, 2018; U.S. Provisional Application No.62/715,178 filed Aug. 6, 2018; U.S. Provisional Application No.62/781,882 filed Dec. 19, 2018; and U.S. Provisional Application No.62/834,315 filed Apr. 15, 2019. Each of these applications cited aboveis hereby incorporated by reference in its entirety.

FIELD

The present disclosure relates generally to methods for detectingdonor-derived DNA within a transplant recipient.

BACKGROUND

There is currently about 190,000 living kidney recipients in the UnitedState and about 20,000 kidney transplant surgeries occur annually. Rapiddetection of kidney allograft injury and/or rejection remains achallenge. Previous attempts to use serum creatinine to determine kidneytransplant status have lacked specificity, and biopsy transplants areinvasive and costly and possibly lead to late diagnosis of transplantinjury and/or rejection.

Because the immune system recognizes an allograft as foreign to a bodyand activates various immune mechanisms to reject the allograft, it isoften necessary to medically suppress the normal immune system responseto reject a transplant. Therefore, there is a need for a non-invasivetest for transplantation rejection that is more sensitive and morespecific than conventional tests.

SUMMARY

In one aspect, the present invention relates to a method of quantifyingthe amount of donor-derived cell-free DNA (dd-cfDNA) in a blood sampleof a transplant recipient, comprising: extracting DNA from the bloodsample of the transplant recipient, wherein the DNA comprisesdonor-derived cell-free DNA and recipient-derived cell-free DNA;performing targeted amplification at 500-50,000 target loci in a singlereaction volume using 500-50,000 primer pairs, wherein the target locicomprise polymorphic loci and non-polymorphic loci, and wherein eachprimer pair is designed to amplify a target sequence of no more than 100bp; and quantifying the amount of donor-derived cell-free DNA in theamplification products.

In another aspect, the present invention relates to a method ofquantifying the amount of donor-derived cell-free DNA (dd-cfDNA) in ablood sample of a transplant recipient, comprising: extracting DNA fromthe blood sample of the transplant recipient, wherein the DNA comprisesdonor-derived cell-free DNA and recipient-derived cell-free DNA, andwherein the extracting step comprises size selection to enrich fordonor-derived cell-free DNA and reduce the amount of recipient-derivedcell-free DNA disposed from bursting white-blood cells; performingtargeted amplification at 500-50,000 target loci in a single reactionvolume using 500-50,000 primer pairs, wherein the target loci comprisepolymorphic loci and non-polymorphic loci; and quantifying the amount ofdonor-derived cell-free DNA in the amplification products.

In another aspect, the present invention relates to a method ofdetecting donor-derived cell-free DNA (dd-cfDNA) in a blood sample of atransplant recipient, comprising: extracting DNA from the blood sampleof the transplant recipient, wherein the DNA comprises donor-derivedcell-free DNA and recipient-derived cell-free DNA; performing targetedamplification at 500-50,000 target loci in a single reaction volumeusing 500-50,000 primer pairs, wherein the target loci comprisepolymorphic loci and non-polymorphic loci; sequencing the amplificationproducts by high-throughput sequencing; and quantifying the amount ofdonor-derived cell-free DNA.

In some embodiments, the method further comprises performing universalamplification of the extracted DNA. In some embodiments, the universalamplification preferentially amplifies donor-derived cell-free DNA overrecipient-derived cell-free DNA that are disposed from burstingwhite-blood cells.

In some embodiments, the transplant recipient is a mammal. In someembodiments, the transplant recipient is a human.

In some embodiments, the transplant recipient has received a transplantselected from organ transplant, tissue transplant, cell transplant, andfluid transplant. In some embodiments, the transplant recipient hasreceived a transplant selected from kidney transplant, liver transplant,pancreas transplant, intestinal transplant, heart transplant, lungtransplant, heart/lung transplant, stomach transplant, testistransplant, penis transplant, ovary transplant, uterus transplant,thymus transplant, face transplant, hand transplant, leg transplant,bone transplant, bone marrow transplant, cornea transplant, skintransplant, pancreas islet cell transplant, heart valve transplant,blood vessel transplant, and blood transfusion. In some embodiments, thetransplant recipient has received a kidney transplant.

In some embodiments, the quantifying step comprises determining thepercentage of donor-derived cell-free DNA out of the total ofdonor-derived cell-free DNA and recipient-derived cell-free DNA in theblood sample. In some embodiments, the quantifying step comprisesdetermining the number of copies of donor-derived cell-free DNA pervolume unit of the blood sample.

In some embodiments, the method further comprises detecting theoccurrence or likely occurrence of active rejection of transplantationusing the quantified amount of donor-derived cell-free DNA. In someembodiments, the method is performed without prior knowledge of donorgenotypes.

In some embodiments, each primer pair is designed to amplify a targetsequence of about 50-100 bp. In some embodiments, each primer pair isdesigned to amplify a target sequence of no more than 75 bp. In someembodiments, each primer pair is designed to amplify a target sequenceof about 60-75 bp. In some embodiments, each primer pair is designed toamplify a target sequence of about 65 bp.

In some embodiments, the targeted amplification comprises amplifying atleast 1,000 polymorphic loci in a single reaction volume. In someembodiments, the targeted amplification comprises amplifying at least2,000 polymorphic loci in a single reaction volume. In some embodiments,the targeted amplification comprises amplifying at least 5,000polymorphic loci in a single reaction volume. In some embodiments, thetargeted amplification comprises amplifying at least 10,000 polymorphicloci in a single reaction volume.

In some embodiments, method further comprises measuring an amount of oneor more alleles at the target loci that are polymorphic loci. In someembodiments, the polymorphic loci and the non-polymorphic loci areamplified in a single reaction.

In some embodiments, the quantifying step comprises detecting theamplified target loci using a microarray. In some embodiments, thequantifying step does not comprise using a microarray.

In some embodiments, the targeted amplification comprises simultaneouslyamplifying 500-50,000 target loci in a single reaction volume using (i)at least 500-50,000 different primer pairs, or (ii) at least 500-50,000target-specific primers and a universal or tag-specific primer500-50,000 primer pairs.

In a further aspect, the present invention relates to a method ofdetermining the likelihood of transplant rejection within a transplantrecipient, the method comprising: extracting DNA from the blood sampleof the transplant recipient, wherein the DNA comprises donor-derivedcell-free DNA and recipient-derived cell-free DNA; performing universalamplification of the extracted DNA; performing targeted amplification at500-50,000 target loci in a single reaction volume using 500-50,000primer pairs, wherein the target loci comprise polymorphic loci andnon-polymorphic loci; sequencing the amplification products byhigh-throughput sequencing; and quantifying the amount of donor-derivedcell-free DNA in the blood sample, wherein a greater amount of dd-cfDNAindicates a greater likelihood of transplant rejection.

In a further aspect, the present invention relates to a method ofdiagnosing a transplant within a transplant recipient as undergoingacute rejection, the method comprising: extracting DNA from the bloodsample of the transplant recipient, wherein the DNA comprisesdonor-derived cell-free DNA and recipient-derived cell-free DNA;performing universal amplification of the extracted DNA; performingtargeted amplification at 500-50,000 target loci in a single reactionvolume using 500-50,000 primer pairs, wherein the target loci comprisepolymorphic loci and non-polymorphic loci; sequencing the amplificationproducts by high-throughput sequencing; and quantifying the amount ofdonor-derived cell-free DNA in the blood sample, wherein an amount ofdd-cfDNA of greater than 1% indicates that the transplant is undergoingacute rejection.

In some embodiments, the transplant rejection is antibody mediatedtransplant rejection. In some embodiments, the transplant rejection is Tcell mediated transplant rejection.

In some embodiments, an amount of dd-cfDNA of less than 1% indicatesthat the transplant is either undergoing borderline rejection,undergoing other injury, or stable.

In a further aspect, the present invention relates to a method ofmonitoring immunosuppressive therapy in a subject, the methodcomprising: extracting DNA from the blood sample of the transplantrecipient, wherein the DNA comprises donor-derived cell-free DNA andrecipient-derived cell-free DNA; performing universal amplification ofthe extracted DNA;

performing targeted amplification at 500-50,000 target loci in a singlereaction volume using 500-50,000 primer pairs, wherein the target locicomprise polymorphic loci and non-polymorphic loci; sequencing theamplification products by high-throughput sequencing; and quantifyingthe amount of donor-derived cell-free DNA in the blood sample, wherein achange in levels of dd-cfDNA over a time interval is indicative oftransplant status.

In some embodiments, the method further comprising adjustingimmunosuppressive therapy based on the levels of dd-cfDNA over the timeinterval.

In some embodiments, an increase in the levels of dd-cfDNA is indicativeof transplant rejection and a need for adjusting immunosuppressivetherapy. In some embodiments, no change or a decrease in the levels ofdd-cfDNA indicates transplant tolerance or stability, and a need foradjusting immunosuppressive therapy.

In some embodiments, an amount of dd-cfDNA of greater than 1% indicatesthat the transplant is undergoing acute rejection. In some embodiments,the transplant rejection is antibody mediated transplant rejection. Insome embodiments, the transplant rejection is T cell mediated transplantrejection.

In some embodiments, an amount of dd-cfDNA of less than 1% indicatesthat the transplant is either undergoing borderline rejection,undergoing other injury, or stable.

In some embodiments, the method does not comprise genotyping thetransplant donor and/or the transplant recipient.

In some embodiments, the method further comprises measuring an amount ofone or more alleles at the target loci that are polymorphic loci.

In some embodiments, the target loci comprise at least 1,000 polymorphicloci, or at least 2,000 polymorphic loci, or at least 5,000 polymorphicloci, or at least 10,000 polymorphic loci.

In some embodiments, the target loci that are amplified in amplicons ofabout 50-100 bp in length, or about 50-90 bp in length, or about 60-80bp in length, or about 60-75 bp in length, or about 65 bp in length.

In some embodiments, the transplant recipient is a human. In someembodiments, the transplant recipient has received a transplant selectedfrom a kidney transplant, liver transplant, pancreas transplant, isletcell transplant, intestinal transplant, heart transplant, lungtransplant, bone marrow transplant, heart valve transplant, or a skintransplant. In some embodiments, the transplant recipient has received akidney transplant.

In some embodiments, the extracting step comprises size selection toenrich for donor-derived cell-free DNA and reduce the amount ofrecipient-derived cell-free DNA disposed from bursting white-bloodcells.

In some embodiments, the universal amplification step preferentiallyamplifies donor-derived cell-free DNA over recipient-derived cell-freeDNA that are disposed from bursting white-blood cells.

In some embodiments, the method comprises longitudinally collecting aplurality of blood samples from the transplant recipient aftertransplantation, and repeating steps (a) to (e) for each blood samplecollected. In some embodiments, the method comprises collecting andanalyzing blood samples from the transplant recipient for a time periodof about three months, or about six months, or about twelve months, orabout eighteen months, or about twenty-four months, etc. In someembodiments, the method comprises collecting blood samples from thetransplant recipient at an interval of about one week, or about twoweeks, or about three weeks, or about one month, or about two months, orabout three months, etc.

In some embodiments, the method has a sensitivity of at least 80%, or atleast 85%, or at least 90%, or at least 95%, or at least 98% inidentifying acute rejection (AR) over non-AR with a cutoff threshold of1% dd-cfDNA and a confidence interval of 95%.

In some embodiments, the method has a specificity of at least 60%, or atleast 65%, or at least 70%, or at least 75%, or at least 80%, or atleast 85%, or at least 90% in identifying AR over non-AR with a cutoffthreshold of 1% dd-cfDNA and a confidence interval of 95%.

In some embodiments, the method has an area under the curve (AUC) of atleast 0.8, or 0.85, or at least 0.9, or at least 0.95 in identifying ARover non-AR with a cutoff threshold of 1% dd-cfDNA and a confidenceinterval of 95%.

In some embodiments, the method has a sensitivity of at least 80%, or atleast 85%, or at least 90%, or at least 95%, or at least 98% inidentifying AR over normal, stable allografts (STA) with a cutoffthreshold of 1% dd-cfDNA and a confidence interval of 95%.

In some embodiments, the method has a specificity of at least 80%, or atleast 85%, or at least 90%, or at least 95%, or at least 98% inidentifying AR over STA with a cutoff threshold of 1% dd-cfDNA and aconfidence interval of 95%.

In some embodiments, the method has an AUC of at least 0.8, or 0.85, orat least 0.9, or at least 0.95, or at least 0.98, or at least 0.99 inidentifying AR over STA with a cutoff threshold of 1% dd-cfDNA and aconfidence interval of 95%.

In some embodiments, the method has a sensitivity as determined by alimit of blank (LoB) of 0.5% or less, and a limit of detection (LoD) of0.5% or less. In some embodiments, LoB is 0.23% or less and LoD is 0.29%or less. In some embodiments, the sensitivity is further determined by alimit of quantitation (LoQ). In some embodiments, LoQ is 10 timesgreater than the LoD; LoQ may be 5 times greater than the LoD; LoQ maybe 1.5 times greater than the LoD; LoQ may be 1.2 times greater than theLoD; LoQ may be 1.1 times greater than the LoD; or LoQ may be equal toor greater than the LoD. In some embodiments, LoB is equal to or lessthan 0.04%, LoD is equal to or less than 0.05%, and/or LoQ is equal tothe LoD.

In some embodiments, the method has an accuracy as determined byevaluating a linearity value obtained from linear regression analysis ofmeasured donor fractions as a function of the corresponding attemptedspike levels, wherein the linearity value is a R2 value, wherein the R2value is from about 0.98 to about 1.0. In some embodiments, the R2 valueis 0.999. In some embodiments, the method has an accuracy as determinedby using linear regression on measured donor fractions as a function ofthe corresponding attempted spike levels to calculate a slope value andan intercept value, wherein the slope value is from about 0.9 to about1.2 and the intercept value is from about −0.0001 to about 0.01. In someembodiments, the slope value is approximately 1, and the intercept valueis approximately 0.

In some embodiments, the method has a precision as determined bycalculating a coefficient of variation (CV), wherein the CV is less thanabout 10.0%. CV is less than about 6%. In some embodiments, the CV isless than about 4%. In some embodiments, the CV is less than about 2%.In some embodiments, the CV is less than about 1%.

In some embodiments, the AR is antibody-mediated rejection (ABMR). Insome embodiments, the AR is T-cell-mediated rejection (TCMR).

Further disclosed herein are methods for detection of transplantdonor-derived cell-free DNA (dd-cfDNA) in a sample from a transplantrecipient. In some embodiments, in the methods disclosed herein, thetransplant recipient is a mammal. In some embodiments, the transplantrecipient is a human. In some embodiments, the transplant recipient hasreceived a transplant selected from a kidney transplant, livertransplant, pancreas transplant, islet cell transplant, intestinaltransplant, heart transplant, lung transplant, bone marrow transplant,heart valve transplant, or a skin transplant. In some embodiments, thetransplant recipient has received a kidney transplant. In someembodiments, the method may be performed on transplant recipients theday of or after transplant surgery, up to a year following transplantsurgery.

In some embodiments, disclosed herein is a method of amplifying targetloci of donor-derived cell-free DNA (dd-cfDNA) from a blood sample of atransplant recipient, the method comprising: a) extracting DNA from theblood sample of the transplant recipient, wherein the DNA comprisescell-free DNA derived from both the transplanted cells and from thetransplant recipient, b) enriching the extracted DNA at target loci,wherein the target loci comprise 50 to 5000 target loci comprisingpolymorphic loci and non-polymorphic loci; and c) amplifying the targetloci.

In some embodiments, disclosed herein is a method of detectingdonor-derived cell-free DNA (dd-cfDNA) in a blood sample from atransplant recipient, the method comprising: a) extracting DNA from theblood sample of the transplant recipient, wherein the DNA comprisescell-free DNA derived from both the transplanted cells and from thetransplant recipient, b) enriching the extracted DNA at target loci,wherein the target loci comprise 50 to 5000 target loci comprisingpolymorphic loci and non-polymorphic loci; c) amplifying the targetloci; d) contacting the amplified target loci with probes thatspecifically hybridize to target loci; and e) detecting binding of thetarget loci with the probes, thereby detecting dd-cfDNA in the bloodsample. In some embodiments, the probes are labelled with a detectablemarker.

In some embodiments, disclosed herein is a method of determining thelikelihood of transplant rejection within a transplant recipient, themethod comprising: a) extracting DNA from the blood sample of thetransplant recipient, wherein the DNA comprises cell-free DNA derivedfrom both the transplanted cells and from the transplant recipient, b)enriching the extracted DNA at target loci, wherein the target locicomprise 50 to 5000 target loci comprising polymorphic loci andnon-polymorphic loci; c) amplifying the target loci; and d) measuring anamount of transplant DNA and an amount of recipient DNA in the recipientblood sample; wherein a greater amount of dd-cfDNA indicates a greaterlikelihood of transplant rejection.

In some embodiments, disclosed herein is a method of diagnosing atransplant within a transplant recipient as undergoing acute rejection,the method comprising: a) extracting DNA from the blood sample of thetransplant recipient, wherein the DNA comprises cell-free DNA derivedfrom both the transplanted cells and from the transplant recipient, b)enriching the extracted DNA at target loci, wherein the target locicomprise 50 to 5000 target loci comprising polymorphic loci andnon-polymorphic loci; c) amplifying the target loci; and d) measuring anamount of transplant DNA and an amount of recipient DNA in the recipientblood sample; wherein an amount of dd-cfDNA of greater than 1% indicatesthat the transplant is undergoing acute rejection.

In some embodiments, in the methods disclosed herein, the transplantrejection is antibody mediated transplant rejection. In someembodiments, the transplant rejection is T cell mediated transplantrejection. In some embodiments, an amount of dd-cfDNA of less than 1%indicates that the transplant is either undergoing borderline rejection,undergoing other injury, or stable.

In some embodiments, disclosed herein is a method of monitoringimmunosuppressive therapy in a subject, the method comprising a)extracting DNA from the blood sample of the transplant recipient,wherein the DNA comprises cell-free DNA derived from both thetransplanted cells and from the transplant recipient, b) enriching theextracted DNA at target loci, wherein the target loci comprise 50 to5000 target loci comprising polymorphic loci and non-polymorphic loci;c) amplifying the target loci; and d) measuring an amount of transplantDNA and an amount of recipient DNA in the recipient blood sample;wherein a change in levels of dd-cfDNA over a time interval isindicative of transplant status. In some embodiments, the method furthercomprises adjusting immunosuppressive therapy based on the levels ofdd-cfDNA over the time interval. In some embodiments, an increase in thelevels of dd-cfDNA are indicative of transplant rejection and a need foradjusting immunosuppressive therapy. In some embodiments, a change or adecrease in the levels of dd-cfDNA indicates transplant tolerance orstability, and a need for adjusting immunosuppressive therapy.

In some embodiments, in the methods disclosed herein, the target locithat are amplified in amplicons of about 50-100 bp in length, or about60-80 bp in length. In some embodiments, the amplicons are about 65 bpin length.

In some embodiments, the methods disclosed herein further comprisemeasuring an amount of transplant DNA and an amount of recipient DNA inthe recipient blood sample.

In some embodiments, the methods disclosed herein do not comprisegenotyping the transplant donor and the transplant recipient.

In some embodiments, the methods disclosed herein further comprisedetecting the amplified target loci using a microarray.

6. In some embodiments, in the methods disclosed herein, the polymorphicloci and the non-polymorphic loci are amplified in a single reaction.

In some embodiments, in the methods disclosed herein, the DNA ispreferentially enriched at the target loci.

In some embodiments, preferentially enriching the DNA in the sample atthe plurality of polymorphic loci includes obtaining a plurality ofpre-circularized probes where each probe targets one of the polymorphicloci, and where the 3′ and 5′ end of the probes are designed tohybridize to a region of DNA that is separated from the polymorphic siteof the locus by a small number of bases, where the small number is 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 to25, 26 to 30, 31 to 60, or a combination thereof, hybridizing thepre-circularized probes to DNA from the sample, filling the gap betweenthe hybridized probe ends using DNA polymerase,

circularizing the pre-circularized probe, and amplifying thecircularized probe.

In some embodiments, preferentially enriching the DNA at the pluralityof polymorphic loci includes obtaining a plurality of ligation-mediatedPCR probes where each PCR probe targets one of the polymorphic loci, andwhere the upstream and downstream PCR probes are designed to hybridizeto a region of DNA, on one strand of DNA, that is separated from thepolymorphic site of the locus by a small number of bases, where thesmall number is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21 to 25, 26 to 30, 31 to 60, or a combination thereof,hybridizing the ligation-mediated PCR probes to the DNA from the firstsample, filling the gap between the ligation-mediated PCR probe endsusing DNA polymerase, ligating the ligation-mediated PCR probes, andamplifying the ligated ligation-mediated PCR probes.

In some embodiments, preferentially enriching the DNA at the pluralityof polymorphic loci includes obtaining a plurality of hybrid captureprobes that target the polymorphic loci, hybridizing the hybrid captureprobes to the DNA in the sample and physically removing some or all ofthe unhybridized DNA from the first sample of DNA.

In some embodiments, the hybrid capture probes are designed to hybridizeto a region that is flanking but not overlapping the polymorphic site.In some embodiments, the hybrid capture probes are designed to hybridizeto a region that is flanking but not overlapping the polymorphic site,and where the length of the flanking capture probe may be selected fromthe group consisting of less than about 120 bases, less than about 110bases, less than about 100 bases, less than about 90 bases, less thanabout 80 bases, less than about 70 bases, less than about 60 bases, lessthan about 50 bases, less than about 40 bases, less than about 30 bases,and less than about 25 bases. In some embodiments, the hybrid captureprobes are designed to hybridize to a region that overlaps thepolymorphic site, and where the plurality of hybrid capture probescomprise at least two hybrid capture probes for each polymorphic loci,and where each hybrid capture probe is designed to be complementary to adifferent allele at that polymorphic locus.

In some embodiments, preferentially enriching the DNA at a plurality ofpolymorphic loci includes obtaining a plurality of inner forward primerswhere each primer targets one of the polymorphic loci, and where the 3′end of the inner forward primers are designed to hybridize to a regionof DNA upstream from the polymorphic site, and separated from thepolymorphic site by a small number of bases, where the small number isselected from the group consisting of 1, 2, 3, 4, 5, 6 to 10, 11 to 15,16 to 20, 21 to 25, 26 to 30, or 31 to 60 base pairs, optionallyobtaining a plurality of inner reverse primers where each primer targetsone of the polymorphic loci, and where the 3′ end of the inner reverseprimers are designed to hybridize to a region of DNA upstream from thepolymorphic site, and separated from the polymorphic site by a smallnumber of bases, where the small number is selected from the groupconsisting of 1, 2, 3, 4, 5, 6 to 10, 11 to 15, 16 to 20, 21 to 25, 26to 30, or 31 to 60 base pairs, hybridizing the inner primers to the DNA,and amplifying the DNA using the polymerase chain reaction to formamplicons.

In some embodiments, the method also includes obtaining a plurality ofouter forward primers where each primer targets one of the polymorphicloci, and where the outer forward primers are designed to hybridize tothe region of DNA upstream from the inner forward primer, optionallyobtaining a plurality of outer reverse primers where each primer targetsone of the polymorphic loci, and where the outer reverse primers aredesigned to hybridize to the region of DNA immediately downstream fromthe inner reverse primer, hybridizing the first primers to the DNA, andamplifying the DNA using the polymerase chain reaction.

In some embodiments, the method also includes obtaining a plurality ofouter reverse primers where each primer targets one of the polymorphicloci, and where the outer reverse primers are designed to hybridize tothe region of DNA immediately downstream from the inner reverse primer,optionally obtaining a plurality of outer forward primers where eachprimer targets one of the polymorphic loci, and where the outer forwardprimers are designed to hybridize to the region of DNA upstream from theinner forward primer, hybridizing the first primers to the DNA, andamplifying the DNA using the polymerase chain reaction.

In some embodiments, preparing the first sample further includesappending universal adapters to the DNA in the first sample andamplifying the DNA in the first sample using the polymerase chainreaction. In some embodiments, at least a fraction of the amplicons thatare amplified are less than 100 bp, less than 90 bp, less than 80 bp,less than 70 bp, less than 65 bp, less than 60 bp, less than 55 bp, lessthan 50 bp, or less than 45 bp, and where the fraction is 10%, 20%, 30%,40%, 50%, 60%, 70%, 80%, 90%, or 99%.

In some embodiments, amplifying the DNA is done in one or a plurality ofindividual reaction volumes, and where each individual reaction volumecontains more than 100 different forward and reverse primer pairs, morethan 200 different forward and reverse primer pairs, more than 500different forward and reverse primer pairs, more than 1,000 differentforward and reverse primer pairs, more than 2,000 different forward andreverse primer pairs, more than 5,000 different forward and reverseprimer pairs, more than 10,000 different forward and reverse primerpairs, more than 20,000 different forward and reverse primer pairs, morethan 50,000 different forward and reverse primer pairs, or more than100,000 different forward and reverse primer pairs.

In some embodiments, preparing the sample further comprises dividing thesample into a plurality of portions, and where the DNA in each portionis preferentially enriched at a subset of the plurality of polymorphicloci. In some embodiments, the inner primers are selected by identifyingprimer pairs likely to form undesired primer duplexes and removing fromthe plurality of primers at least one of the pair of primers identifiedas being likely to form undesired primer duplexes. In some embodiments,the inner primers contain a region that is designed to hybridize eitherupstream or downstream of the targeted polymorphic locus, and optionallycontain a universal priming sequence designed to allow PCRamplification. In some embodiments, at least some of the primersadditionally contain a random region that differs for each individualprimer molecule. In some embodiments, at least some of the primersadditionally contain a molecular barcode.

In some embodiments, the method comprises: (a) performing multiplexpolymerase chain reaction (PCR) on a nucleic acid sample comprisingtarget loci to simultaneously amplify at least 1,000 distinct targetloci using either (i) at least 1,000 different primer pairs, or (ii) atleast 1,000 target-specific primers and a universal or tag-specificprimer, in a single reaction volume to produce amplified productscomprising target amplicons; and (b) sequencing the amplified products.In some embodiments, the method does not comprise using a microarray.

In some embodiments, the method comprises (a) performing multiplexpolymerase chain reaction (PCR) on the cell free DNA sample comprisingtarget loci to simultaneously amplify at least 1,000 distinct targetloci using either (i) at least 1,000 different primer pairs, or (ii) atleast 1,000 target-specific primers and a universal or tag-specificprimer, in a single reaction volume to produce amplified productscomprising target amplicons; and b) sequencing the amplified products.In some embodiments, the method does not comprise using a microarray.

In some embodiments, the method also includes obtaining genotypic datafrom one or both of the transplant donor and the transplant recipient.In some embodiments, obtaining genotypic data from one or both of thetransplant donor and the transplant recipient includes preparing the DNAfrom the donor and the recipient where the preparing comprisespreferentially enriching the DNA at the plurality of polymorphic loci togive prepared DNA, optionally amplifying the prepared DNA, and measuringthe DNA in the prepared sample at the plurality of polymorphic loci.

In some embodiments, building a joint distribution model for theexpected allele count probabilities of the plurality of polymorphic locion the chromosome is done using the obtained genetic data from the oneor both of the transplant donor and the transplant recipient. In someembodiments, the first sample has been isolated from transplantrecipient plasma and where the obtaining genotypic data from thetransplant recipient is done by estimating the recipient genotypic datafrom the DNA measurements made on the prepared sample.

In some embodiments, preferential enrichment results in average degreeof allelic bias between the prepared sample and the first sample of afactor selected from the group consisting of no more than a factor of 2,no more than a factor of 1.5, no more than a factor of 1.2, no more thana factor of 1.1, no more than a factor of 1.05, no more than a factor of1.02, no more than a factor of 1.01, no more than a factor of 1.005, nomore than a factor of 1.002, no more than a factor of 1.001 and no morethan a factor of 1.0001. In some embodiments, the plurality ofpolymorphic loci are SNPs. In some embodiments, measuring the DNA in theprepared sample is done by sequencing.

In some embodiments, a diagnostic box is disclosed for helping todetermine transplant status in a transplant recipient where thediagnostic box is capable of executing the preparing and measuring stepsof the disclosed methods.

In some embodiments, the allele counts are probabilistic rather thanbinary. In some embodiments, measurements of the DNA in the preparedsample at the plurality of polymorphic loci are also used to determinewhether or not the transplant has inherited one or a plurality of linkedhaplotypes.

In some embodiments, building a joint distribution model for allelecount probabilities is done by using data about the probability ofchromosomes crossing over at different locations in a chromosome tomodel dependence between polymorphic alleles on the chromosome. In someembodiments, building a joint distribution model for allele counts andthe step of determining the relative probability of each hypothesis aredone using a method that does not require the use of a referencechromosome.

In some embodiments, determining the relative probability of eachhypothesis makes use of an estimated fraction of donor-derived cell-freeDNA (dd-cfDNA) in the prepared sample. In some embodiments, the DNAmeasurements from the prepared sample used in calculating allele countprobabilities and determining the relative probability of eachhypothesis comprise primary genetic data. In some embodiments, selectingthe transplant status corresponding to the hypothesis with the greatestprobability is carried out using maximum likelihood estimates or maximuma posteriori estimates.

In some embodiments, calling the transplant status also includescombining the relative probabilities of each of the status hypothesesdetermined using the joint distribution model and the allele countprobabilities with relative probabilities of each of the statushypotheses that are calculated using statistical techniques taken from agroup consisting of a read count analysis, comparing heterozygosityrates, a statistic that is only available when parental geneticinformation is used, the probability of normalized genotype signals forcertain donor/recipient contexts, a statistic that is calculated usingan estimated transplant fraction of the first sample or the preparedsample, and combinations thereof.

In some embodiments, a confidence estimate is calculated for the calledtransplant status. In some embodiments, the method also includes takinga clinical action based on the called transplant status.

In some embodiments, a report displaying a determined transplant statusis generated using the method. In some embodiments, a kit is disclosedfor determining a transplant status designed to be used with the methodsdisclosed herein, the kit including a plurality of inner forward primersand optionally the plurality of inner reverse primers, where each of theprimers is designed to hybridize to the region of DNA immediatelyupstream and/or downstream from one of the polymorphic sites on thetarget chromosome, and optionally additional chromosomes, where theregion of hybridization is separated from the polymorphic site by asmall number of bases, where the small number is selected from the groupconsisting of 1, 2, 3, 4, 5, 6 to 10, 11 to 15, 16 to 20, 21 to 25, 26to 30, 31 to 60, and combinations thereof.

In some embodiments, the methods disclosed herein comprise a selectionstep to select for shorter cfDNA.

In some embodiments, the methods disclosed herein comprise a universalapplication step to enrich for cfDNA.

In some embodiments, the determination that the amount of dd-cfDNA abovea cutoff threshold is indicative of acute rejection of the transplant.Machine learning may be used to resolve rejection vs non-rejection.

In some embodiments, the cutoff threshold value is expressed aspercentage of dd-cfDNA (dd-cfDNA %) in the blood sample.

In some embodiments, the cutoff threshold value is expressed as copynumber of dd-cfDNA per volume unit of the blood sample.

In some embodiments, the cutoff threshold value is expressed as copynumber of dd-cfDNA per volume unit of the blood sample multiplied bybody mass or blood volume of the transplant recipient.

In some embodiments, the cutoff threshold value takes into account thebody mass or blood volume of the patient.

In some embodiments, the cutoff threshold value takes into account oneor more of the followings: donor genome copies per volume of plasma,cell-free DNA yield per volume of plasma, donor height, donor weight,donor age, donor gender, donor ethnicity, donor organ mass, donor organ,live vs deceased donor, related vs unrelated donor, recipient height,recipient weight, recipient age, recipient gender, recipient ethnicity,creatinine, eGFR (estimated glomerular filtration rate), cfDNAmethylation, DSA (donor-specific antibodies), KDPI (kidney donor profileindex), medications (immunosuppression, steroids, blood thinners, etc.),infections (BKV, EBV, CMV, UTI), recipient and/or donor HLA alleles orepitope mismatches, Banff classification of renal allograft pathology,and for-cause vs surveillance or protocol biopsy.

In some embodiments, the cutoff threshold value is scaled according tothe amount of total cfDNA in the blood sample.

In some embodiments, the method has a sensitivity of at least 80% inidentifying acute rejection (AR) over non-AR when the dd-cfDNA amount isabove the cutoff threshold value scaled according to the amount of totalcfDNA in the blood sample and a confidence interval of 95%.

In some embodiments, the method has a specificity of at least 70% inidentifying acute rejection (AR) over non-AR when the dd-cfDNA amount isabove the cutoff threshold value scaled according to the amount of totalcfDNA in the blood sample and a confidence interval of 95%.

In some embodiments, the method has a sensitivity of at least 80% inidentifying acute rejection (AR) over non-AR when the dd-cfDNA amount isabove the cutoff threshold value scaled according to the amount of totalcfDNA in the blood sample and a confidence interval of 95%. In someembodiments, the method has a sensitivity of at least 85% in identifyingacute rejection (AR) over non-AR when the dd-cfDNA amount is above thecutoff threshold value scaled according to the amount of total cfDNA inthe blood sample and a confidence interval of 95%. In some embodiments,the method has a sensitivity of at least 90% in identifying acuterejection (AR) over non-AR when the dd-cfDNA amount is above the cutoffthreshold value scaled according to the amount of total cfDNA in theblood sample and a confidence interval of 95%. In some embodiments, themethod has a sensitivity of at least 95% in identifying acute rejection(AR) over non-AR when the dd-cfDNA amount is be above the cutoffthreshold value scaled according to the amount of total cfDNA in theblood sample and a confidence interval of 95%.

In some embodiments, the method has a specificity of at least 70% inidentifying acute rejection (AR) over non-AR when the dd-cfDNA amount isabove the cutoff threshold value scaled according to the amount of totalcfDNA in the blood sample and a confidence interval of 95%. In someembodiments, the method has a specificity of at least 75% in identifyingacute rejection (AR) over non-AR when the dd-cfDNA amount is above thecutoff threshold value scaled according to the amount of total cfDNA inthe blood sample and a confidence interval of 95%. In some embodiments,the method has a specificity of at least 85% in identifying acuterejection (AR) over non-AR when the dd-cfDNA amount is above the cutoffthreshold value scaled according to the amount of total cfDNA in theblood sample and a confidence interval of 95%. In some embodiments, themethod has a specificity of at least 90% in identifying acute rejection(AR) over non-AR when the dd-cfDNA amount is above the cutoff thresholdvalue scaled according to the amount of total cfDNA in the blood sampleand a confidence interval of 95%. In some embodiments, the method has aspecificity of at least 95% in identifying acute rejection (AR) overnon-AR when the dd-cfDNA amount is above the cutoff threshold valuescaled according to the amount of total cfDNA in the blood sample and aconfidence interval of 95%.

BRIEF DESCRIPTION OF THE DRAWINGS

The presently disclosed embodiments will be further explained withreference to the attached drawings, wherein like structures are referredto by like numerals throughout the several views. The drawings shown arenot necessarily to scale, with emphasis instead generally being placedupon illustrating the principles of the presently disclosed embodiments.

FIG. 1 exemplifies how DNA released from transplanted kidneys into thebloodstream is elevated in acute graft rejection.

FIG. 2 exemplifies the high capacity that dd-cfDNA demonstrates fordetection of kidney transplant rejection. Using a threshold of 1%dd-cfDNA, a sensitivity of 92.3%, a specificity of 72.9% and an AUC of0.9 is achieved.

FIG. 3 exemplifies the % dd-cfDNA between kidney transplant recipientsthat were either stable, undergoing acute rejection, undergoingborderline rejection, or experiencing other transplant injury.

FIG. 4 exemplifies the ability of the disclosed methods to detect eitherborderline or acute transplant rejections where the transplants areundergoing either antibody-mediated rejection (ABMR) or T-cell mediatedrejection (TCMR).

FIG. 5 exemplifies the clinical relevance of detecting dd-cfDNA, asdisclosed herein, for detection of transplant rejection immediatelyfollowing surgery.

FIG. 6 exemplifies the value of repeated measurements within individualtransplant recipient patients following transplantation surgery.

FIG. 7 exemplifies the discriminatory ability of serum creatinine levelsto discriminate between transplants undergoing acute rejection (AR) andthose not undergoing acute rejection (Non-AR).

FIG. 8 is a flow-chart illustrating a conventional approach to mutationcalling and a motif-specific approach to mutation calling.

FIG. 9 illustrates one or more implementations of modelling a samplepreparation process.

FIG. 10 illustrates a block diagram of one or more implementations of anerror analysis system.

FIG. 11 illustrates one or more implementations of a method for callinga mutation using a motif-specific error model.

FIG. 12 illustrates one or more implementations of a method fordetermining a mutation fraction.

FIG. 13: Plasma Sample Breakdown.

FIG. 14A-C: Discrimination of active rejection by dd-cfDNA (A) versuscreatinine (B) and eGFR (C). Boxes indicate interquartile range (25^(th)to 75^(th) percentile); horizontal lines in boxes represent medians;dots indicate outliers >1.5 times the upper quartile value. For Panel C,eGFR values were only calculated for 200 samples due to the availably ofdata; the non-AR group for eGFR analysis included 79 borderline, 65other injury, and 7 stable samples. P-values for dd-cfDNA adjusted usingKruskal-Wallis rank sum test followed by Dunn multiple comparison testswith Holm correction; P-values for creatinine and eGFR adjusted viaTukey's test.

FIG. 15A-C: Predictive statistics for acute rejection versus non-acuterejection.

FIG. 16: Predictive statistics for acute rejection versus stable. Boxesindicate inter-quartile range, horizontal lines represent medians.

FIG. 17: dd-cfDNA as a function of antibody-mediated-versusT-cell-mediated rejection. Boxes indicate interquartile range (25^(th)to 75^(th) percentile); horizontal lines in boxes represent medians;dots indicate all individual data points. P-values for dd-cfDNA adjustedusing Kruskal-Wallis rank sum test followed by Dunn multiple comparisontests with Holm correction. ABMR, antibody-mediated rejection; b,borderline; TCMR, T-cell-mediated rejection.

FIG. 18A-F: Modeling dd-cfDNA as a function of Banff scores. Six (of 15)histological features with significant differences in dd-cfDNA level byBanff scores are shown here (P<0.01 for all). Boxes indicateinterquartile range (25^(th) to 75^(th) percentile); horizontal lines inboxes represent medians; dots indicate all individual data points byrejection status. P-values for dd-cfDNA adjusted using Kruskal-Wallisrank sum test followed by Dunn multiple comparison tests with Holmcorrection.

FIG. 19: Relationship between dd-cfDNA and donor type. No significantdifference by donor type was observed (P>0.46). P-values for dd-cfDNAadjusted using Kruskal-Wallis rank sum test followed by Dunn multiplecomparison tests with Holm correction.

FIG. 20A-B: Variability in dd-cfDNA over time. (A) Inter-patientvariability (60 samples from 60 patients over time). (B) Intra-patientvariability (samples from the same 10 patients over time)

FIG. 21A-D: dd-cfDNA Levels over Time in Patients with Acute Rejection.

FIG. 22: Flow diagram of the experimental design

FIG. 23A-D: Histograms of measured donor fractions. FIG. 23A showsmeasured donor fractions for related samples from Lot 1. FIG. 23B showsmeasured donor fractions for unrelated samples from Lot 1. FIG. 23Cshows measured donor fractions for related samples from Lot 2.

FIG. 23D shows measured donor fractions for unrelated samples from Lot2.

FIG. 24A-B: Graphs showing measured percent CV values as a function ofthe corresponding percent empirical means for related samples (A) andunrelated samples (B).

FIG. 25A-C: Graphs showing measured donor fractions as a function of thecorresponding attempted spike levels, along with the calculated linearfit for related cases only (A), for unrelated cases only (B), forrelated and unrelated cases together (C).

FIG. 26A-C: Graphs showing measured donor fractions as a function of thecorresponding attempted spike levels on log-log scale for related casesonly (A), for unrelated cases only (B), for related and unrelated casestogether (C).

FIG. 27A-C: Graphs showing measured donor fractions as a function of thecorresponding ddPCR values, along with the calculated linear fit forrelated cases only (A), unrelated cases only (B), related and unrelatedcases together (C).

FIG. 28A-B: Graphs showing measured donor fractions from Lot 2 as afunction of the values from Lot 1 on linear scale, along with thecalculated linear fit (A) and on log-log scale (B).

FIG. 29A-D: Graphs showing histograms of measured donor fractions for:related gDNA (A), unrelated gDNA (B), related cfDNA (C), and unrelatedcfDNA samples (D).

FIG. 30A-D: Graphs showing histograms of centered, measured donorfractions for: related samples from Lot 1 (A), related samples from Lot2 (B), unrelated samples from Lot 1 (C), and unrelated samples from Lot2 (D).

FIG. 31A-B: Graphs depicting empirical standard deviations as a functionof the corresponding empirical means for: related samples from Lot 1 andLot 2(A), unrelated samples from Lot 1 and Lot 2 (B).

FIG. 32A-B: Graphs depicting measured percent CV values as a function ofthe corresponding percent empirical means, particularized with respectto input amount, for gDNA samples: from related samples (A) and fromunrelated samples (B).

FIG. 33A-B: Graphs depicting measured percent CV values as a function ofthe corresponding percent empirical means for cfDNA samples: fromrelated samples (A) and from unrelated samples (B).

FIG. 34A-C: Graphs depicting measured donor fractions as a function ofthe corresponding donor fraction values measure by using HNR, along withthe calculated linear fit for related cases only (A), for unrelatedcases only (B), and both related and unrelated cases (C).

FIG. 35A-C: Graphs depicting measured donor fractions as a function ofthe corresponding attempted spike levels, along with the calculatedlinear fit, for gDNA. samples from related cases only (A), fromunrelated cases only (B). and both related and unrelated cases together(C).

FIG. 36A-C: Graphs depicting measured donor fractions as a function ofthe corresponding attempted spike levels on log-log scale for gDNAsamples: from related cases only (A), from unrelated cases only (B), andrelated and unrelated cases together (C).

FIG. 37A-C: Graphs depicting measured donor fractions as a function ofthe corresponding attempted spike levels, along with the calculatedlinear fit, for cfDNA samples from related cases only (A), fromunrelated cases only (B), and from related and unrelated cases together(C).

FIG. 38A-C: Graphs depicting measured donor fractions as a function ofthe corresponding attempted spike levels on log-log scale for cfDNAsamples: from related cases only (A), from unrelated cases only (B), andrelated and unrelated cases together (C).

FIG. 39A-B: Graphs showing histograms of measured donor fractions for(A) 0.6% spike level and (B) 2.4% spike level.

FIG. 40A-B: Accuracy assessment of KidneyScan (A) and Grskovic et alassay (B).

FIG. 41: Discrimination of active rejection by dd-cfDNA inbiopsy-matched samples (data stratified by biopsy type), Boxes indicateinter-quartile range, horizontal lines represent medians.

FIG. 42: Discrimination of active rejection by dd-cfDNA (A) versus eGFR(B). Boxes indicate interquartile range (25^(th) to 75^(th) percentile);horizontal lines in boxes represent medians; dots indicate outliers >1.5times the upper quartile value. P-values for dd-cfDNA and eGFR usingKruskal-Wallis rank sum test indicate a significative difference betweenthe medians of the AR and non-rejection groups for both markers

FIG. 43: dd-cfDNA as a function of antibody-mediated versusT-cell-mediated rejection, Boxes indicate interquartile range (25^(th)to 75^(th) percentile); horizontal lines in boxes represent medians;dots indicate all individual data points. P-values for dd-cfDNA adjustedusing Kruskal-Wallis rank sum test followed by Dunn multiple comparisontests with Holm correction, ^(a)Samples assigned ABMR and bTCMR.^(b)Samples assign ABMR and TCMR. ^(c)Samples assigned TCMR and bABMR.ABMR, antibody-mediated rejection; b, borderline; TCMR, T-cell-mediatedrejection.

FIG. 44: Relationship between dd-cfDNA and donor type, No significantdifference by donor type was observed (P>0.46). P-values for dd-cfDNAadjusted using Kruskal-Wallis rank sum test followed by Dunn multiplecomparison tests with Holm correction.

FIG. 45: Cumulative distributions of SNP minor allele frequencyaccording to ethnicity.

FIG. 46: Allele ratios for SNPs on chromosomes 13, 18, 21 for samplewith 9% donor fraction. The SNPs between the black horizontal lines areremoved from the calculation.

FIG. 47: Allele ratios for SNPs on chromosomes 13, 18, 21 for samplewith 0.4% donor fraction.

FIG. 48: Performance of using donor copies/ML and donor copies/ML*kg asthe metric with fixed threshold. Black arrows shows protocol activerejection and T-cell mediated rejections missed by using dd-cfDNA % asthe threshold metric.

FIG. 49: Graph depicting dd-cfDNA % (upper panel), donor copies/mL(middle panel), and donor copies/mL*kg (lower panel) from patient dataas a function of ng cfDNA/mL plasma.

FIG. 50: Stratification of samples by cfDNA ng/mL amounts. As cfDNAng/mL increases, both sensitivity and specificity increase for donorcopies/mL and donor copies/mL*kg as the metric.

FIG. 51: Distribution of active rejection (AR) and non rejection(NON_AR) samples across quartile (upper panel) and octile (lower panel)stratification of samples by cfDNA ng/mL amounts,

FIG. 52: Stratification of samples by cfDNA ng/mL amounts and furthercategorized based on determination of antibody mediated rejection (ABMR)or T-cell mediated rejection (TCMR). The panels shows determination ofABMR or TCMR based on dd-cfDNA %, donor copies/mL, or donor copies/mL*kgthreshold metrics as indicated in the figure panel.

While the above-identified drawings set forth presently disclosedembodiments, other embodiments are also contemplated, as noted in thediscussion. This disclosure presents illustrative embodiments by way ofrepresentation and not limitation. Numerous other modifications andembodiments can be devised by those skilled in the art which fall withinthe scope and spirit of the principles of the presently disclosedembodiments.

DETAILED DESCRIPTION

Disclosed herein are methods for detection of transplant donor-derivedcell-free DNA (dd-cfDNA) in a sample from a transplant recipient.

In some embodiments, disclosed herein is a method of amplifying targetloci of donor-derived cell-free DNA (dd-cfDNA) from a blood sample of atransplant recipient, the method comprising: a)

extracting DNA from the blood sample of the transplant recipient,wherein the DNA comprises cell-free DNA derived from both thetransplanted cells and from the transplant recipient, b) enriching theextracted DNA at target loci, wherein the target loci comprise 50 to5000 target loci comprising polymorphic loci and non-polymorphic loci;and c) amplifying the target loci.

In some embodiments, disclosed herein is a method of detectingdonor-derived cell-free DNA (dd-cfDNA) in a blood sample from atransplant recipient, the method comprising: a) extracting DNA from theblood sample of the transplant recipient, wherein the DNA comprisescell-free DNA derived from both the transplanted cells and from thetransplant recipient, b) enriching the extracted DNA at target loci,wherein the target loci comprise 50 to 5000 target loci comprisingpolymorphic loci and non-polymorphic loci; c) amplifying the targetloci; d) contacting the amplified target loci with probes thatspecifically hybridize to target loci; and e) detecting binding of thetarget loci with the probes, thereby detecting dd-cfDNA in the bloodsample. In some embodiments, the probes are labelled with a detectablemarker.

In some embodiments, disclosed herein is a method of determining thelikelihood of transplant rejection within a transplant recipient, themethod comprising: a) extracting DNA from the blood sample of thetransplant recipient, wherein the DNA comprises cell-free DNA derivedfrom both the transplanted cells and from the transplant recipient, b)enriching the extracted DNA at target loci, wherein the target locicomprise 50 to 5000 target loci comprising polymorphic loci andnon-polymorphic loci; c) amplifying the target loci; and d) measuring anamount of transplant DNA and an amount of recipient DNA in the recipientblood sample; wherein a greater amount of dd-cfDNA indicates a greaterlikelihood of transplant rejection.

In some embodiments, disclosed herein is a method of diagnosing atransplant within a transplant recipient as undergoing acute rejection,the method comprising: a) extracting DNA from the blood sample of thetransplant recipient, wherein the DNA comprises cell-free DNA derivedfrom both the transplanted cells and from the transplant recipient, b)enriching the extracted DNA at target loci, wherein the target locicomprise 50 to 5000 target loci comprising polymorphic loci andnon-polymorphic loci; c) amplifying the target loci; and d) measuring anamount of transplant DNA and an amount of recipient DNA in the recipientblood sample; wherein an amount of dd-cfDNA of greater than 1% indicatesthat the transplant is undergoing acute rejection.

In an embodiment, a method disclosed herein uses selective enrichmenttechniques that preserve the relative allele frequencies that arepresent in the original sample of DNA at each polymorphic locus from aset of polymorphic loci. In some embodiments the amplification and/orselective enrichment technique may involve PCR such as ligation mediatedPCR, fragment capture by hybridization, MOLECULAR INVERSION PROBES, orother circularizing probes. In some embodiments, methods foramplification or selective enrichment may involve using probes where,upon correct hybridization to the target sequence, the 3-prime end or5-prime end of a nucleotide probe is separated from the polymorphic siteof the allele by a small number of nucleotides. This separation reducespreferential amplification of one allele, termed allele bias. This is animprovement over methods that involve using probes where the 3-prime endor 5-prime end of a correctly hybridized probe are directly adjacent toor very near to the polymorphic site of an allele. In an embodiment,probes in which the hybridizing region may or certainly contains apolymorphic site are excluded. Polymorphic sites at the site ofhybridization can cause unequal hybridization or inhibit hybridizationaltogether in some alleles, resulting in preferential amplification ofcertain alleles. These embodiments are improvements over other methodsthat involve targeted amplification and/or selective enrichment in thatthey better preserve the original allele frequencies of the sample ateach polymorphic locus, whether the sample is pure genomic sample from asingle individual or mixture of individuals.

After blood draw and before DNA extraction, blood cells within a bloodsample may burse and shed long fragments of DNA into the sample, whichwould increase the total amount of cell-free DNA (cfDNA) and backgroundnoise, distorting thd % of dd-cfDNA detected. In order to reduce suchbackground noise, and based on the observation that dd-cfDNA istypically shorter than DNA shredded from a transplant recipient bloodcell, two particular enrichments for dd-cfDNA are contemplated. In oneembodiment, a size selection is applied to select for shorter cfDNA. Inanother embodiment, a universal amplification step is applied to reducenoise (e.g., before applying multiplex PCR), based on the hypothesisthat shorter dd-cfDNA (often in mononucleosome form) is amplified moreefficiently than longer transplant recipient-derived DNA

In an embodiment, a method disclosed herein uses highly efficient highlymultiplexed targeted PCR to amplify DNA followed by high throughputsequencing to determine the allele frequencies at each target locus. Theability to multiplex more than about 50 or 100 PCR primers in onereaction in a way that most of the resulting sequence reads map totargeted loci is novel and non-obvious. One technique that allows highlymultiplexed targeted PCR to perform in a highly efficient mannerinvolves designing primers that are unlikely to hybridize with oneanother. The PCR probes, typically referred to as primers, are selectedby creating a thermodynamic model of potentially adverse interactionsbetween at least 500, at least 1,000, at least 5,000, at least 10,000,at least 20,000, at least 50,000, or at least 100,000 potential primerpairs, or unintended interactions between primers and sample DNA, andthen using the model to eliminate designs that are incompatible withother the designs in the pool. Another technique that allows highlymultiplexed targeted PCR to perform in a highly efficient manner isusing a partial or full nesting approach to the targeted PCR. Using oneor a combination of these approaches allows multiplexing of at least300, at least 800, at least 1,200, at least 4,000 or at least 10,000primers in a single pool with the resulting amplified DNA comprising amajority of DNA molecules that, when sequenced, will map to targetedloci. Using one or a combination of these approaches allows multiplexingof a large number of primers in a single pool with the resultingamplified DNA comprising greater than 50%, greater than 80%, greaterthan 90%, greater than 95%, greater than 98%, or greater than 99% DNAmolecules that map to targeted loci.

In an embodiment, a method disclosed herein yields a quantitativemeasure of the number of independent observations of each allele at apolymorphic locus. This is unlike most methods such as microarrays orqualitative PCR which provide information about the ratio of two allelesbut do not quantify the number of independent observations of eitherallele. With methods that provide quantitative information regarding thenumber of independent observations, only the ratio is utilized in therelevant determinations, while the quantitative information by itself isnot useful. To illustrate the importance of retaining information aboutthe number of independent observations consider the sample locus withtwo alleles, A and B. In a first experiment twenty A alleles and twentyB alleles are observed, in a second experiment 200 A alleles and 200 Balleles are observed. In both experiments the ratio (A/(A+B)) is equalto 0.5, however the second experiment conveys more information than thefirst about the certainty of the frequency of the A or B allele. Somemethods known in the prior art involve averaging or summing alleleratios (channel ratios) (i.e. x_(i)/y_(i)) from individual allele andanalyzes this ratio, either comparing it to a reference chromosome orusing a rule pertaining to how this ratio is expected to behave inparticular situations. No allele weighting is implied in such methodsknown in the art, where it is assumed that one can ensure about the sameamount of PCR product for each allele and that all the alleles shouldbehave the same way. Such a method has a number of disadvantages, andmore importantly, precludes the use a number of improvements that aredescribed elsewhere in this disclosure.

The use of a joint distribution model is a different from and asignificant improvement over methods that determine heterozygosity ratesby treating polymorphic loci independently in that the resultantdeterminations are of significantly higher accuracy. Without being boundby any particular theory, it is believed that one reason they are ofhigher accuracy is that the joint distribution model takes into accountthe linkage between SNPs. The purpose of using the concept of linkagewhen creating the expected distribution of allele measurements for oneor more hypotheses is that it allows the creation of expected allelemeasurements distributions that correspond to reality considerablybetter than when linkage is not used.

One reason that it is believed that ploidy determinations that use amethod that comprises comparing the observed allele measurements totheoretical hypotheses corresponding to possible transplant states areof higher accuracy is that when sequencing is used to measure thealleles, this method can glean more information from data from alleleswhere the total number of reads is low than other methods; for example,a method that relies on calculating and aggregating allele ratios wouldproduce disproportionately weighted stochastic noise. For example,imagine a case that involved measuring the alleles using sequencing, andwhere there was a set of loci where only five sequence reads weredetected for each locus. In an embodiment, for each of the alleles, thedata may be compared to the hypothesized allele distribution, andweighted according to the number of sequence reads; therefore the datafrom these measurements would be appropriately weighted and incorporatedinto the overall determination. This is in contrast to a method thatinvolved quantitating a ratio of alleles at a heterozygous locus, asthis method could only calculate ratios of 0%, 20%, 40%, 60%, 80% or100% as the possible allele ratios; none of these may be close toexpected allele ratios. In this latter case, the calculated allelerations would either have to be discarded due to insufficient reads orelse would have disproportionate weighting and introduce stochasticnoise into the determination, thereby decreasing the accuracy of thedetermination. In an embodiment, the individual allele measurements maybe treated as independent measurements, where the relationship betweenmeasurements made on alleles at the same locus is no different from therelationship between measurements made on alleles at different loci.

In an embodiment, a method disclosed herein demonstrates how observingallele distributions at polymorphic loci can be used to determine thestate of a transplant with greater accuracy than methods in the priorart. In an embodiment, the method observes the quantitative alleleinformation obtained on the transplant donor/recipient mixture andevaluating which hypothesis fits the data best, where the transplantstate corresponding to the hypothesis with the best fit to the data iscalled as the correct transplant state. In an embodiment, a methoddisclosed herein also uses the degree of fit to generate a confidencethat the called genetic state is the correct transplant state. In anembodiment, a method disclosed herein involves using algorithms thatanalyze the distribution of alleles found for loci that have differentcontexts, and comparing the observed allele distributions to theexpected allele distributions for different transplant states for thedifferent genotypic contexts. This is different from and an improvementover methods that do not use methods that enable the estimation of thenumber of independent instances of each allele at each locus in a mixedsample.

In an embodiment, a method disclosed herein uses a joint distributionmodel that assumes that the allele frequencies at each locus aremultinomial (and thus binomial when SNPs are biallelic) in nature. Insome embodiments the joint distribution model uses beta-binomialdistributions. When using a measuring technique, such as sequencing,provides a quantitative measure for each allele present at each locus,binomial model can be applied to each locus and the degree underlyingallele frequencies and the confidence in that frequency can beascertained. With methods known in the art that generate transplantstatus calls from allele ratios, or methods in which quantitative alleleinformation is discarded, the certainty in the observed ratio cannot beascertained. The instant method is different from and an improvementover methods that calculate allele ratios and aggregate those ratios tomake a transplant status call, since any method that involvescalculating an allele ratio at a particular locus, and then aggregatingthose ratios, necessarily assumes that the measured intensities orcounts that are indicative of the amount of DNA from any given allele orlocus will be distributed in a Gaussian fashion. The method disclosedherein does not involve calculating allele ratios. In some embodiments,a method disclosed herein may involve incorporating the number ofobservations of each allele at a plurality of loci into a model. In someembodiments, a method disclosed herein may involve calculating theexpected distributions themselves, allowing the use of a joint binomialdistribution model which may be more accurate than any model thatassumes a Gaussian distribution of allele measurements. The likelihoodthat the binomial distribution model is significantly more accurate thanthe Gaussian distribution increases as the number of loci increases. Forexample, when fewer than 20 loci are interrogated, the likelihood thatthe binomial distribution model is significantly better is low. However,when more than 100, or especially more than 400, or especially more than1,000, or especially more than 2,000 loci are used, the binomialdistribution model will have a very high likelihood of beingsignificantly more accurate than the Gaussian distribution model,thereby resulting in a more accurate transplant status determination.The likelihood that the binomial distribution model is significantlymore accurate than the Gaussian distribution also increases as thenumber of observations at each locus increases. For example, when fewerthan 10 distinct sequences are observed at each locus are observed, thelikelihood that the binomial distribution model is significantly betteris low. However, when more than 50 sequence reads, or especially morethan 100 sequence reads, or especially more than 200 sequence reads, orespecially more than 300 sequence reads are used for each locus, thebinomial distribution model will have a very high likelihood of beingsignificantly more accurate than the Gaussian distribution model,thereby resulting in a more accurate ploidy determination.

In an embodiment, a method disclosed herein uses sequencing to measurethe number of instances of each allele at each locus in a DNA sample.Each sequencing read may be mapped to a specific locus and treated as abinary sequence read; alternately, the probability of the identity ofthe read and/or the mapping may be incorporated as part of the sequenceread, resulting in a probabilistic sequence read, that is, the probablewhole or fractional number of sequence reads that map to a given loci.Using the binary counts or probability of counts it is possible to use abinomial distribution for each set of measurements, allowing aconfidence interval to be calculated around the number of counts. Thisability to use the binomial distribution allows for more accurate ploidyestimations and more precise confidence intervals to be calculated. Thisis different from and an improvement over methods that use intensitiesto measure the amount of an allele present, for example methods that usemicroarrays, or methods that make measurements using fluorescencereaders to measure the intensity of fluorescently tagged DNA inelectrophoretic bands.

In an embodiment, a method disclosed herein uses aspects of the presentset of data to determine parameters for the estimated allele frequencydistribution for that set of data. This is an improvement over methodsthat utilize training set of data or prior sets of data to setparameters for the present expected allele frequency distributions, orpossibly expected allele ratios. This is because there are differentsets of conditions involved in the collection and measurement of everygenetic sample, and thus a method that uses data from the instant set ofdata to determine the parameters for the joint distribution model thatis to be used in the transplant status determination for that samplewill tend to be more accurate.

In an embodiment, a method disclosed herein involves determining whetherthe distribution of observed allele measurements is indicative oftransplant rejection status using a maximum likelihood technique. Theuse of a maximum likelihood technique is different from and asignificant improvement over methods that use single hypothesisrejection technique in that the resultant determinations will be madewith significantly higher accuracy. One reason is that single hypothesisrejection techniques set cut off thresholds based on only onemeasurement distribution rather than two, meaning that the thresholdsare usually not optimal. Another reason is that the maximum likelihoodtechnique allows the optimization of the cut off threshold for eachindividual sample instead of determining a cut off threshold to be usedfor all samples regardless of the particular characteristics of eachindividual sample. Another reason is that the use of a maximumlikelihood technique allows the calculation of a confidence for eachtransplant status call. The ability to make a confidence calculation foreach call allows a practitioner to know which calls are accurate, andwhich are more likely to be wrong. In some embodiments, a wide varietyof methods may be combined with a maximum likelihood estimationtechnique to enhance the accuracy of the transplant status calls. In anembodiment, the maximum likelihood technique may be used in combinationwith the method described in U.S. Pat. No. 7,888,017. In an embodiment,the maximum likelihood technique may be used in combination with themethod of using targeted PCR amplification to amplify the DNA in themixed sample followed by sequencing and analysis using a read countingmethod such as used by TANDEM DIAGNOSTICS, as presented at theInternational Congress of Human Genetics 2011, in Montreal in October2011. In an embodiment, a method disclosed herein involves estimatingthe donor fraction of DNA in the mixed sample and using that estimationto calculate both the transplant status call and the confidence of thetransplant status call.

In an embodiment, a method disclosed herein takes into account thetendency for the data to be noisy and contain errors by attaching aprobability to each measurement. The use of maximum likelihoodtechniques to choose the correct hypothesis from the set of hypothesesthat were made using the measurement data with attached probabilisticestimates makes it more likely that the incorrect measurements will bediscounted, and the correct measurements will be used in thecalculations that lead to the transplant status call. To be moreprecise, this method systematically reduces the influence of data thatis incorrectly measured on the transplant status call determination.This is an improvement over methods where all data is assumed to beequally correct or methods where outlying data is arbitrarily excludedfrom calculations leading to a transplant status call. Existing methodsusing channel ratio measurements claim to extend the method to multipleSNPs by averaging individual SNP channel ratios. Not weightingindividual SNPs by expected measurement variance based on the SNPquality and observed depth of read reduces the accuracy of the resultingstatistic, resulting in a reduction of the accuracy of the transplantstatus call significantly, especially in borderline cases.

In an embodiment, a method disclosed herein does not presuppose theknowledge of which SNPs or other polymorphic loci are heterozygous onthe transplant. This method allows a ploidy call to be made in caseswhere paternal genotypic information is not available. This is animprovement over methods where the knowledge of which SNPs areheterozygous must be known ahead of time in order to appropriatelyselect loci to target, or to interpret the genetic measurements made onthe donor/recipient DNA sample.

The methods described herein are particularly advantageous when used onsamples where a small amount of DNA is available, or where the percentof donor-derived DNA is low. This is due to the correspondingly higherallele dropout rate that occurs when only a small amount of DNA isavailable and/or the correspondingly higher donor allele dropout ratewhen the percent of donor DNA is low in a mixed sample of donor andtransplant recipient DNA. A high allele dropout rate, meaning that alarge percentage of the alleles were not measured for the targetindividual, results in poorly accurate donor fractions calculations, andpoorly accurate transplant status determinations. Since methodsdisclosed herein may use a joint distribution model that takes intoaccount the linkage in inheritance patterns between SNPs, significantlymore accurate transplant status determinations may be made.

Further discussion of the points above may be found elsewhere in thisdocument.

Non-Invasive Transplant Monitoring

The process of non-invasive transplant monitoring involves a number ofsteps. Some of the steps may include: (1) obtaining the genetic materialfrom the transplant; (2) enriching the genetic material of thetransplant that may be in a mixed sample, ex vivo; (3) amplifying thegenetic material, ex vivo; (4) preferentially enriching specific loci inthe genetic material, ex vivo; (5) measuring the genetic material, exvivo; and (6) analyzing the genotypic data, on a computer, and ex vivo.Methods to reduce to practice these six and other relevant steps aredescribed herein. At least some of the method steps are not directlyapplied on the body. In an embodiment, the present disclosure relates tomethods of treatment and diagnosis applied to tissue and otherbiological materials isolated and separated from the body. At least someof the method steps are executed on a computer.

The high accuracy of the methods disclosed herein is a result of aninformatics approach to analysis of the genotype data, as describedherein. Modern technological advances have resulted in the ability tomeasure large amounts of genetic information from a genetic sample usingsuch methods as high throughput sequencing and genotyping arrays. Themethods disclosed herein allow a clinician to take greater advantage ofthe large amounts of data available, and make a more accurate diagnosisof the status of a transplant in a recipient. The details of a number ofembodiments are given below. Different embodiments may involve differentcombinations of the aforementioned steps. Various combinations of thedifferent embodiments of the different steps may be usedinterchangeably.

In an embodiment, a blood sample is taken from a transplant recipient,and the free floating DNA in the plasma of the transplant recipient'sblood, which contains a mixture of both DNA of transplant donor origin,and DNA of transplant recipient origin, is isolated and used todetermine the status of the transplant. In an embodiment, a methoddisclosed herein involves preferential enrichment of those DNA sequencesin a mixture of DNA that correspond to polymorphic alleles in a way thatthe allele ratios and/or allele distributions remain mostly consistentupon enrichment. In an embodiment, a method disclosed herein involvesthe highly efficient targeted PCR based amplification such that a veryhigh percentage of the resulting molecules correspond to targeted loci.In an embodiment, a method disclosed herein involves sequencing amixture of DNA that contains both DNA of donor origin, and DNA ofrecipient origin. In an embodiment, a method disclosed herein involvesusing measured allele distributions to determine the state of atransplant in a transplant recipient. In an embodiment, a methoddisclosed herein involves reporting the determined transplant state to aclinician. In an embodiment, a method disclosed herein involves taking aclinical action, such as altering immunosuppressive therapy in thetransplant recipient.

This application makes reference to U.S. Utility application Ser. No.15/727,428, filed Oct. 6, 2017 (U.S. Publication No. 20180025109); U.S.Utility application Ser. No. 11/603,406, filed Nov. 28, 2006 (USPublication No.: 20070184467); U.S. Utility application Ser. No.12/076,348, filed Mar. 17, 2008 (US Publication No.: 20080243398); PCTUtility Application Serial No. PCT/US09/52730, filed Aug. 4, 2009 (PCTPublication No.: WO/2010/017214); PCT Utility Application Serial No.PCT/US10/050824, filed Sep. 30, 2010 (PCT Publication No.:WO/2011/041485), and U.S. Utility application Ser. No. 13/110,685, filedMay 18, 2011. Some of the vocabulary used in this filing may have itsantecedents in these references. Some of the concepts described hereinmay be better understood in light of the concepts found in thesereferences.

Screening Transplant Recipient Blood Comprising Free Floating Donor DNA

In an embodiment, blood may be drawn from a transplant recipient.Research has shown that transplant recipient blood may contain a smallamount of free floating DNA from the derived from the transplant, inaddition to free floating DNA of transplant recipient origin. There aremany methods know in the art to isolate cell free DNA, or createfractions enriched in cell free DNA. For example, chromatography hasbeen show to create certain fractions that are enriched in cell freeDNA.

Once the sample of blood, plasma, or other fluid, drawn in a relativelynon-invasive manner, and that contains an amount of donor-derived DNA,either cellular or free floating, either enriched in its proportion tothe recipient-derived DNA, or in its original ratio, is in hand, one maygenotype the DNA found in said sample. In some embodiments, the bloodmay be drawn using a needle to withdraw blood from a vein, for example,the basilica vein. The method described herein can be used to determinegenotypic data of the transplant. For example, it can be used todetermine the identity of one or a set of SNPs, including insertions,deletions, and translocations. It can be used to determine one or morehaplotypes, including the parent of origin of one or more genotypicfeatures.

Note that this method will work with any nucleic acids that can be usedfor any genotyping and/or sequencing methods, such as the ILLUMINAINFINIUM ARRAY platform, AFFYMETRIX GENECHIP, ILLUMINA GENOME ANALYZER,or LIFE TECHNOLGIES' SOLID SYSTEM. This includes extracted free-floatingDNA from plasma or amplifications (e.g. whole genome amplification, PCR)of the same; genomic DNA from other cell types (e.g. human lymphocytesfrom whole blood) or amplifications of the same. For preparation of theDNA, any extraction or purification method that generates genomic DNAsuitable for the one of these platforms will work as well. This methodcould work equally well with samples of RNA. In an embodiment, storageof the samples may be done in a way that will minimize degradation (e.g.below freezing, at about −20 C, or at a lower temperature).

Definitions

-   Single Nucleotide Polymorphism (SNP) refers to a single nucleotide    that may differ between the genomes of two members of the same    species. The usage of the term should not imply any limit on the    frequency with which each variant occurs.-   Sequence refers to a DNA sequence or a genetic sequence. It may    refer to the primary, physical structure of the DNA molecule or    strand in an individual. It may refer to the sequence of nucleotides    found in that DNA molecule, or the complementary strand to the DNA    molecule. It may refer to the information contained in the DNA    molecule as its representation in silico.-   Locus refers to a particular region of interest on the DNA of an    individual, which may refer to a SNP, the site of a possible    insertion or deletion, or the site of some other relevant genetic    variation. Disease-linked SNPs may also refer to disease-linked    loci.-   Polymorphic Allele, also “Polymorphic Locus,” refers to an allele or    locus where the genotype varies between individuals within a given    species. Some examples of polymorphic alleles include single    nucleotide polymorphisms, short tandem repeats, deletions,    duplications, and inversions.-   Polymorphic Site refers to the specific nucleotides found in a    polymorphic region that vary between individuals.-   Allele refers to the genes that occupy a particular locus.-   Genetic Data also “Genotypic Data” refers to the data describing    aspects of the genome of one or more individuals. It may refer to    one or a set of loci, partial or entire sequences, partial or entire    chromosomes, or the entire genome. It may refer to the identity of    one or a plurality of nucleotides; it may refer to a set of    sequential nucleotides, or nucleotides from different locations in    the genome, or a combination thereof. Genotypic data is typically in    silico, however, it is also possible to consider physical    nucleotides in a sequence as chemically encoded genetic data.    Genotypic Data may be said to be “on,” “of,” “at,” “from” or “on”    the individual(s). Genotypic Data may refer to output measurements    from a genotyping platform where those measurements are made on    genetic material.-   Genetic Material also “Genetic Sample” refers to physical matter,    such as tissue or blood, from one or more individuals comprising DNA    or RNA-   Noisy Genetic Data refers to genetic data with any of the following:    allele dropouts, uncertain base pair measurements, incorrect base    pair measurements, missing base pair measurements, uncertain    measurements of insertions or deletions, uncertain measurements of    chromosome segment copy numbers, spurious signals, missing    measurements, other errors, or combinations thereof.-   Confidence refers to the statistical likelihood that the called SNP,    allele, set of alleles, ploidy call, or determined number of    chromosome segment copies correctly represents the real genetic    state of the individual.-   Chromosome may refer to a single chromosome copy, meaning a single    molecule of DNA of which there are 46 in a normal somatic cell; an    example is ‘the maternally derived chromosome 18’. Chromosome may    also refer to a chromosome type, of which there are 23 in a normal    human somatic cell; an example is ‘chromosome 18’.-   Chromosomal Identity may refer to the referent chromosome number,    i.e. the chromosome type. Normal humans have 22 types of numbered    autosomal chromosome types, and two types of sex chromosomes. It may    also refer to the parental origin of the chromosome. It may also    refer to a specific chromosome inherited from the parent. It may    also refer to other identifying features of a chromosome.-   The State of the Genetic Material or simply “Genetic State” may    refer to the identity of a set of SNPs on the DNA, to the phased    haplotypes of the genetic material, and to the sequence of the DNA,    including insertions, deletions, repeats and mutations. It may also    refer to the ploidy state of one or more chromosomes, chromosomal    segments, or set of chromosomal segments.-   Allelic Data refers to a set of genotypic data concerning a set of    one or more alleles. It may refer to the phased, haplotypic data. It    may refer to SNP identities, and it may refer to the sequence data    of the DNA, including insertions, deletions, repeats and mutations.    It may include the parental origin of each allele.-   Allelic State refers to the actual state of the genes in a set of    one or more alleles. It may refer to the actual state of the genes    described by the allelic data.-   Allelic Ratio or allele ratio, refers to the ratio between the    amount of each allele at a locus that is present in a sample or in    an individual. When the sample was measured by sequencing, the    allelic ratio may refer to the ratio of sequence reads that map to    each allele at the locus. When the sample was measured by an    intensity based measurement method, the allele ratio may refer to    the ratio of the amounts of each allele present at that locus as    estimated by the measurement method.-   Allele Count refers to the number of sequences that map to a    particular locus, and if that locus is polymorphic, it refers to the    number of sequences that map to each of the alleles. If each allele    is counted in a binary fashion, then the allele count will be whole    number. If the alleles are counted probabilistically, then the    allele count can be a fractional number.-   Allele Count Probability refers to the number of sequences that are    likely to map to a particular locus or a set of alleles at a    polymorphic locus, combined with the probability of the mapping.    Note that allele counts are equivalent to allele count probabilities    where the probability of the mapping for each counted sequence is    binary (zero or one). In some embodiments, the allele count    probabilities may be binary. In some embodiments, the allele count    probabilities may be set to be equal to the DNA measurements.-   Allelic Distribution, or ‘allele count distribution’ refers to the    relative amount of each allele that is present for each locus in a    set of loci. An allelic distribution can refer to an individual, to    a sample, or to a set of measurements made on a sample. In the    context of sequencing, the allelic distribution refers to the number    or probable number of reads that map to a particular allele for each    allele in a set of polymorphic loci. The allele measurements may be    treated probabilistically, that is, the likelihood that a given    allele is present for a give sequence read is a fraction between 0    and 1, or they may be treated in a binary fashion, that is, any    given read is considered to be exactly zero or one copies of a    particular allele.-   Allelic Distribution Pattern refers to a set of different allele    distributions for different parental contexts. Certain allelic    distribution patterns may be indicative of certain ploidy states.-   Allelic Bias refers to the degree to which the measured ratio of    alleles at a heterozygous locus is different to the ratio that was    present in the original sample of DNA. The degree of allelic bias at    a particular locus is equal to the observed allelic ratio at that    locus, as measured, divided by the ratio of alleles in the original    DNA sample at that locus. Allelic bias may be defined to be greater    than one, such that if the calculation of the degree of allelic bias    returns a value, x, that is less than 1, then the degree of allelic    bias may be restated as 1/x. Allelic bias maybe due to amplification    bias, purification bias, or some other phenomenon that affects    different alleles differently.-   Primer, also “PCR probe” refers to a single DNA molecule (a DNA    oligomer) or a collection of DNA molecules (DNA oligomers) where the    DNA molecules are identical, or nearly so, and where the primer    contains a region that is designed to hybridize to a targeted    polymorphic locus, and m contain a priming sequence designed to    allow PCR amplification. A primer may also contain a molecular    barcode. A primer may contain a random region that differs for each    individual molecule.-   Hybrid Capture Probe refers to any nucleic acid sequence, possibly    modified, that is generated by various methods such as PCR or direct    synthesis and intended to be complementary to one strand of a    specific target DNA sequence in a sample. The exogenous hybrid    capture probes may be added to a prepared sample and hybridized    through a deanture-reannealing process to form duplexes of    exogenous-endogenous fragments. These duplexes may then be    physically separated from the sample by various means.-   Sequence Read refers to data representing a sequence of nucleotide    bases that were measured using a clonal sequencing method. Clonal    sequencing may produce sequence data representing single, or clones,    or clusters of one original DNA molecule. A sequence read may also    have associated quality score at each base position of the sequence    indicating the probability that nucleotide has been called    correctly.-   Mapping a sequence read is the process of determining a sequence    read's location of origin in the genome sequence of a particular    organism. The location of origin of sequence reads is based on    similarity of nucleotide sequence of the read and the genome    sequence.-   Matched Copy Error, also “Matching Chromosome Aneuploidy” (MCA),    refers to a state of aneuploidy where one cell contains two    identical or nearly identical chromosomes. This type of aneuploidy    may arise during the formation of the gametes in meiosis, and may be    referred to as a meiotic non-disjunction error. This type of error    may arise in mitosis. Matching trisomy may refer to the case where    three copies of a given chromosome are present in an individual and    two of the copies are identical.-   Homologous Chromosomes refers to chromosome copies that contain the    same set of genes that normally pair up during meiosis.-   Identical Chromosomes refers to chromosome copies that contain the    same set of genes, and for each gene they have the same set of    alleles that are identical, or nearly identical.-   Allele Drop Out (ADO) refers to the situation where at least one of    the base pairs in a set of base pairs from homologous chromosomes at    a given allele is not detected.-   Locus Drop Out (LDO) refers to the situation where both base pairs    in a set of base pairs from homologous chromosomes at a given allele    are not detected.-   Homozygous refers to having similar alleles as corresponding    chromosomal loci.-   Heterozygous refers to having dissimilar alleles as corresponding    chromosomal loci.-   Heterozygosity Rate refers to the rate of individuals in the    population having heterozygous alleles at a given locus. The    heterozygosity rate may also refer to the expected or measured ratio    of alleles, at a given locus in an individual, or a sample of DNA.-   Highly Informative Single Nucleotide Polymorphism (HISNP) refers to    a SNP where the transplant has an allele that is not present in the    transplant recipient's genotype.-   Chromosomal Region refers to a segment of a chromosome, or a full    chromosome.-   Segment of a Chromosome refers to a section of a chromosome that can    range in size from one base pair to the entire chromosome.-   Chromosome refers to either a full chromosome, or a segment or    section of a chromosome.-   Copies refers to the number of copies of a chromosome segment. It    may refer to identical copies, or to non-identical, homologous    copies of a chromosome segment wherein the different copies of the    chromosome segment contain a substantially similar set of loci, and    where one or more of the alleles are different. Note that in some    cases of aneuploidy, such as the M2 copy error, it is possible to    have some copies of the given chromosome segment that are identical    as well as some copies of the same chromosome segment that are not    identical.-   Haplotype refers to a combination of alleles at multiple loci that    are typically inherited together on the same chromosome. Haplotype    may refer to as few as two loci or to an entire chromosome depending    on the number of recombination events that have occurred between a    given set of loci. Haplotype can also refer to a set of single    nucleotide polymorphisms (SNPs) on a single chromatid that are    statistically associated.-   Haplotypic Data, also “Phased Data” or “Ordered Genetic Data,”    refers to data from a single chromosome in a diploid or polyploid    genome, i.e., either the segregated maternal or paternal copy of a    chromosome in a diploid genome.-   Phasing refers to the act of determining the haplotypic genetic data    of an individual given unordered, diploid (or polyploidy) genetic    data. It may refer to the act of determining which of two genes at    an allele, for a set of alleles found on one chromosome, are    associated with each of the two homologous chromosomes in an    individual.-   Phased Data refers to genetic data where one or more haplotypes have    been determined.-   Hypothesis refers to a possible ploidy state at a given set of    chromosomes, or a set of possible allelic states at a given set of    loci. The set of possibilities may comprise one or more elements.-   Target Individual refers to the individual whose genetic state is    being determined. In some embodiments, only a limited amount of DNA    is available from the target individual. In some embodiments, the    target individual is a transplant. In some embodiments, there may be    more than one target individual. In some embodiments, each    transplant that originated from a pair of parents may be considered    to be target individuals. In some embodiments, the genetic data that    is being determined is one or a set of allele calls. In some    embodiments, the genetic data that is being determined is a ploidy    call.-   Related Individual refers to any individual who is genetically    related to, and thus shares haplotype blocks with, the target    individual. In one context, the related individual may be a genetic    parent of the target individual, or any genetic material derived    from a parent, such as a sperm, a polar body, an embryo, a    transplant, or a child. It may also refer to a sibling, parent or a    grandparent.-   DNA of Donor Origin refers to DNA that was originally part of a cell    whose genotype was essentially equivalent to that of the transplant    donor.-   DNA of Recipient Origin refers to DNA that was originally part of a    cell whose genotype was essentially equivalent to that of the    transplant recipient.-   Transplant recipient plasma refers to the plasma portion of the    blood from a female from a patient who has received an allograft,    e.g., an organ transplant recipient.-   Clinical Decision refers to any decision to take or not take an    action that has an outcome that affects the health or survival of an    individual.-   Diagnostic Box refers to one or a combination of machines designed    to perform one or a plurality of aspects of the methods disclosed    herein. In an embodiment, the diagnostic box may be placed at a    point of patient care. In an embodiment, the diagnostic box may    perform targeted amplification followed by sequencing. In an    embodiment the diagnostic box may function alone or with the help of    a technician.-   Informatics Based Method refers to a method that relies heavily on    statistics to make sense of a large amount of data. In the context    of prenatal diagnosis, it refers to a method designed to determine    the ploidy state at one or more chromosomes or the allelic state at    one or more alleles by statistically inferring the most likely    state, rather than by directly physically measuring the state, given    a large amount of genetic data, for example from a molecular array    or sequencing.-   Primary Genetic Data refers to the analog intensity signals that are    output by a genotyping platform. In the context of SNP arrays,    primary genetic data refers to the intensity signals before any    genotype calling has been done. In the context of sequencing,    primary genetic data refers to the analog measurements, analogous to    the chromatogram, that comes off the sequencer before the identity    of any base pairs have been determined, and before the sequence has    been mapped to the genome.-   Secondary Genetic Data refers to processed genetic data that are    output by a genotyping platform. In the context of a SNP array, the    secondary genetic data refers to the allele calls made by software    associated with the SNP array reader, wherein the software has made    a call whether a given allele is present or not present in the    sample. In the context of sequencing, the secondary genetic data    refers to the base pair identities of the sequences have been    determined, and possibly also where the sequences have been mapped    to the genome.-   Preferential Enrichment of DNA that corresponds to a locus, or    preferential enrichment of DNA at a locus, refers to any method that    results in the percentage of molecules of DNA in a post-enrichment    DNA mixture that correspond to the locus being higher than the    percentage of molecules of DNA in the pre-enrichment DNA mixture    that correspond to the locus. The method may involve selective    amplification of DNA molecules that correspond to a locus. The    method may involve removing DNA molecules that do not correspond to    the locus. The method may involve a combination of methods. The    degree of enrichment is defined as the percentage of molecules of    DNA in the post-enrichment mixture that correspond to the locus    divided by the percentage of molecules of DNA in the pre-enrichment    mixture that correspond to the locus. Preferential enrichment may be    carried out at a plurality of loci. In some embodiments of the    present disclosure, the degree of enrichment is greater than 20. In    some embodiments of the present disclosure, the degree of enrichment    is greater than 200. In some embodiments of the present disclosure,    the degree of enrichment is greater than 2,000. When preferential    enrichment is carried out at a plurality of loci, the degree of    enrichment may refer to the average degree of enrichment of all of    the loci in the set of loci.-   Amplification refers to a method that increases the number of copies    of a molecule of DNA.-   Selective Amplification may refer to a method that increases the    number of copies of a particular molecule of DNA, or molecules of    DNA that correspond to a particular region of DNA. It may also refer    to a method that increases the number of copies of a particular    targeted molecule of DNA, or targeted region of DNA more than it    increases non-targeted molecules or regions of DNA. Selective    amplification may be a method of preferential enrichment.-   Universal Priming Sequence refers to a DNA sequence that may be    appended to a population of target DNA molecules, for example by    ligation, PCR, or ligation mediated PCR. Once added to the    population of target molecules, primers specific to the universal    priming sequences can be used to amplify the target population using    a single pair of amplification primers. Universal priming sequences    are typically not related to the target sequences.-   Universal Adapters, or ‘ligation adaptors’ or ‘library tags’ are DNA    molecules containing a universal priming sequence that can be    covalently linked to the 5-prime and 3-prime end of a population of    target double stranded DNA molecules. The addition of the adapters    provides universal priming sequences to the 5-prime and 3-prime end    of the target population from which PCR amplification can take    place, amplifying all molecules from the target population, using a    single pair of amplification primers.-   Targeting refers to a method used to selectively amplify or    otherwise preferentially enrich those molecules of DNA that    correspond to a set of loci, in a mixture of DNA.-   Joint Distribution Model refers to a model that defines the    probability of events defined in terms of multiple random variables,    given a plurality of random variables defined on the same    probability space, where the probabilities of the variable are    linked. In some embodiments, the degenerate case where the    probabilities of the variables are not linked may be used.-   Limit of Blank (LoB) is the highest apparent analyte concentration    expected to be found when replicates of a blank sample containing no    analyte are tested. For example, as used herein, LoB may be defined    as the empirical 95th percentile value measured from a set of blank    (no-analyte) samples. Accordingly, in an embodiment of the present    disclosure, the sensitivity of the method of determining transplant    status may be determined by a limit of blank (LoB). The desired LoB    may be equal to or less than 5%; it may be equal to or less than 2%;    it may be equal to or less than 1%; it may be equal to or less than    0.5%; it may be equal to or less than 0.25%; it may equal to or less    than 0.23%; it may be equal to or less than 0.11%; it may be equal    to or less than 0.08%; it may be equal to or less than 0.04%.-   Limits of Detection (LoD) is the lowest analyte concentration likely    to be reliably distinguished from the LoB and at which detection is    feasible. LoD is determined by utilizing both the measured LoB and    test replicates of a sample known to contain a low concentration of    analyte. For example, LoD may be calculated following the parametric    estimate method specified in EP-17A2, which computes LoD by adding a    standard deviation term to the LoB. Accordingly, in an embodiment of    the present disclosure, the sensitivity of the method of determining    transplant status may be determined by a LoD less than 1%; it may be    less than 0.5%; it may be less than 0.25%; it may equal to or less    than 0.23%; it may be equal to or less than 0.11%; it may be equal    to or less than 0.08%; it may be equal to or less than 0.04%.-   Limits of Quantification (LoQ) is the lowest concentration at which    the analyte can not only be reliably detected but at which some    predefined goals of bias and imprecision are met. LoQ may be    equivalent to LoD or it could be at a higher concentration.

Hypotheses

In the context of this disclosure, a hypothesis refers to a possibletransplant status. In some embodiments, a set of hypotheses may bedesigned such that one hypothesis from the set will correspond to theactual transplant status of any given individual. In some embodiments, aset of hypotheses may be designed such that every possible transplantstatus may be described by at least one hypothesis from the set. In someembodiments of the present disclosure, one aspect of a method is todetermine which hypothesis corresponds to the actual transplant statusof the individual in question.

In another embodiment of the present disclosure, one step involvescreating a hypothesis. Creating a hypothesis may refer to the act ofsetting the limits of the variables such that the entire set of possibletransplant statuses that are under consideration are encompassed bythose variables.

Genotypic Contexts

The genotypic context refers to the genetic state of a given allele, oneach of the two relevant chromosomes for one or both of the two sourcesof the target. The genotypic context for a given SNP may consist of fourbase pairs; they may be the same or different from one another. It istypically written as “m₁m2|f₁f₂,” where m₁ and m₂ are the genetic stateof the given SNP on the two donor chromosomes, and f₁ and f₂ are thegenetic state of the given SNP on the two recipient chromosomes. In someembodiments, the genotypic context may be written as “f₁f₂|m₁m₂” Notethat subscripts “1” and “2” refer to the genotype, at the given allele,of the first and second chromosome; also note that the choice of whichchromosome is labeled “1” and which is labeled “2” is arbitrary.

Note that in this disclosure, A and B are often used to genericallyrepresent base pair identities; A or B could equally well represent C(cytosine), G (guanine), A (adenine) or T (thymine). For example, if, ata given SNP based allele, the transplant recipient's genotype was T atthat SNP on one chromosome, and G at that SNP on the homologouschromosome, and the transplant donor's genotype at that allele is G atthat SNP on both of the homologous chromosomes, one may say that thetarget individual's allele has the genotypic context of AB|BB; it couldalso be said that the allele has the genotypic context of AB|AA. Notethat, in theory, any of the four possible nucleotides could occur at agiven allele, and thus it is possible, for example, for the transplantrecipient to have a genotype of AT, and the transplant donor to have agenotype of GC at a given allele. However, empirical data indicate thatin most cases only two of the four possible base pairs are observed at agiven allele. It is possible, for example when using single tandemrepeats, to have more than two parental, more than four and even morethan ten contexts. In this disclosure the discussion assumes that onlytwo possible base pairs will be observed at a given allele, although theembodiments disclosed herein could be modified to take into account thecases where this assumption does not hold.

A “genotypic context” may refer to a set or subset of target SNPs thathave the same genotypic context. For example, if one were to measure1000 alleles on a given chromosome on a target individual, then thecontext AA|BB could refer to the set of all alleles in the group of1,000 alleles where the genotype of the transplant recipient of thetarget was homozygous, and the genotype of the transplant donor of thetarget is homozygous, but where the recipient genotype and the donorgenotype are dissimilar at that locus. If the data is not phased, andthus AB=BA, then there are nine possible genotypic contexts: AA|AA,AA|AB, AA|BB, AB|AA, AB|AB, AB|BB, BB|AA, BB|AB, and BB|BB. If the datais phased, and thus AB≠BA, then there are sixteen different possiblegenotypic contexts: AA|AA, AA|AB, AA|BA, AA|BB, AB|AA, AB|AB, AB|BA,AB|BB, BA|AA, BA|AB, BA|BA, BA|BB, BB|AA, BB|AB, BB|BA, and BB|BB. EverySNP allele on a chromosome, excluding some SNPs on the sex chromosomes,has one of these genotypic contexts. The set of SNPs wherein thegenotypic context for one parent is heterozygous may be referred to asthe heterozygous context.

Use of Genotypic Contexts in Non-Invasive Determination of TransplantState

Non-invasive determination of transplant state is an important techniquethat can be used to determine the genetic state of a transplant fromgenetic material that is obtained in a non-invasive manner, for examplefrom a blood draw on the transplant recipient. The blood could beseparated and the plasma isolated, followed by isolation of the plasmaDNA. Size selection could be used to isolate the DNA of the appropriatelength. The DNA may be preferentially enriched at a set of loci. ThisDNA can then be measured by a number of means, such as by hybridizing toa genotyping array and measuring the fluorescence, or by sequencing on ahigh throughput sequencer.

When considering which alleles to target, one may consider thelikelihood that some parental contexts are likely to be more informativethan others. For example, AA|BB and the symmetric context BB|AA are themost informative contexts, because the transplant is known to carry anallele that is different from the transplant recipient. For reasons ofsymmetry, both AA|BB and BB|AA contexts may be referred to as AA|BB.Another set of informative genotypic contexts are AA|AB and BB|AB,because in these cases the transplant has a 50% chance of carrying anallele that the transplant recipient does not have. For reasons ofsymmetry, both AA|AB and BB|AB contexts may be referred to as AA|AB. Athird set of informative parental contexts are AB|AA and AB|BB, becausein these cases the transplant is carrying a known donor allele, and thatallele is also present in the recipient genome. For reasons of symmetry,both AB|AA and AB|BB contexts may be referred to as AB|AA. A fourthcontext is AB|AB where the transplant has an unknown allelic state, andwhatever the allelic state, it is one in which the transplant recipienthas the same alleles. The fifth context is AA|AA, where the transplantrecipient and transplant donor are heterozygous.

Different Implementations of the Presently Disclosed Embodiments

In some embodiments the source of the genetic material to be used indetermining the genetic state of the transplant may be transplanteddonor-derived cells. The method may involve obtaining a blood samplefrom the transplant recipient.

In an embodiment of the present disclosure, the target individual is atransplant, and the different genotype measurements are made on aplurality of DNA samples from the transplant. In some embodiments of thepresent disclosure, the donor-derived DNA samples are from isolatedtransplanted cells where the donor-derived cells may be mixed withrecipient cells. In some embodiments of the present disclosure, thedonor-derived DNA samples are from free floating donor-derived DNA,where the donor DNA may be mixed with free floating recipient DNA.

In some embodiments, the genetic sample may be prepared and/or purified.There are a number of standard procedures known in the art to accomplishsuch an end. In some embodiments, the sample may be centrifuged toseparate various layers. In some embodiments, the DNA may be isolatedusing filtration. In some embodiments, the preparation of the DNA mayinvolve amplification, separation, purification by chromatography,liquid liquid separation, isolation, preferential enrichment,preferential amplification, targeted amplification, or any of a numberof other techniques either known in the art or described herein.

In some embodiments, a method of the present disclosure may involveamplifying DNA. Amplification of the DNA, a process which transforms asmall amount of genetic material to a larger amount of genetic materialthat comprises a similar set of genetic data, can be done by a widevariety of methods, including, but not limited to polymerase chainreaction (PCR). One method of amplifying DNA is whole genomeamplification (WGA). There are a number of methods available for WGA:ligation-mediated PCR (LM-PCR), degenerate oligonucleotide primer PCR(DOP-PCR), and multiple displacement amplification (MDA). In LM-PCR,short DNA sequences called adapters are ligated to blunt ends of DNA.These adapters contain universal amplification sequences, which are usedto amplify the DNA by PCR. In DOP-PCR, random primers that also containuniversal amplification sequences are used in a first round of annealingand PCR. Then, a second round of PCR is used to amplify the sequencesfurther with the universal primer sequences. MDA uses the phi-29polymerase, which is a highly processive and non-specific enzyme thatreplicates DNA and has been used for single-cell analysis. The majorlimitations to amplification of material from a single cell are (1)necessity of using extremely dilute DNA concentrations or extremelysmall volume of reaction mixture, and (2) difficulty of reliablydissociating DNA from proteins across the whole genome. Regardless,single-cell whole genome amplification has been used successfully for avariety of applications for a number of years. There are other methodsof amplifying DNA from a sample of DNA. The DNA amplification transformsthe initial sample of DNA into a sample of DNA that is similar in theset of sequences, but of much greater quantity. In some cases,amplification may not be required.

In some embodiments, DNA may be amplified using a universalamplification, such as WGA or MDA. In some embodiments, DNA may beamplified by targeted amplification, for example using targeted PCR, orcircularizing probes. In some embodiments, the DNA may be preferentiallyenriched using a targeted amplification method, or a method that resultsin the full or partial separation of desired from undesired DNA, such ascapture by hybridization approaches. In some embodiments, DNA may beamplified by using a combination of a universal amplification method anda preferential enrichment method. A fuller description of some of thesemethods can be found elsewhere in this document.

The genetic data of the target individual and/or of the relatedindividual can be transformed from a molecular state to an electronicstate by measuring the appropriate genetic material using tools and ortechniques taken from a group including, but not limited to: genotypingmicroarrays, and high throughput sequencing. Some high throughputsequencing methods include Sanger DNA sequencing, pyrosequencing, theILLUMINA SOLEXA platform, ILLUMINA's GENOME ANALYZER, or APPLIEDBIOSYSTEM's 454 sequencing platform, HELICOS's TRUE SINGLE MOLECULESEQUENCING platform, HALCYON MOLECULAR's electron microscope sequencingmethod, or any other sequencing method. All of these methods physicallytransform the genetic data stored in a sample of DNA into a set ofgenetic data that is typically stored in a memory device en route tobeing processed.

A relevant individual's genetic data may be measured by analyzingsubstances taken from a group including, but not limited to: theindividual's bulk diploid tissue, one or more diploid cells from theindividual, one or more haploid cells from the individual, one or moreblastomeres from the target individual, extra-cellular genetic materialfound on the individual, extra-cellular genetic material from theindividual found in maternal blood, cells from the individual found inmaternal blood, one or more embryos created from (a) gamete(s) from therelated individual, one or more blastomeres taken from such an embryo,extra-cellular genetic material found on the related individual, geneticmaterial known to have originated from the related individual, andcombinations thereof.

In some embodiments, the knowledge of the determined transplant statusmay be used to make a clinical decision. This knowledge, typicallystored as a physical arrangement of matter in a memory device, may thenbe transformed into a report. The report may then be acted upon. Forexample, the clinical decision may be to adjust immunosuppressivemedication intake by a transplant recipient.

In an embodiment of the present disclosure, any of the methods describedherein may be modified to allow for multiple targets to come from sametarget individual, for example, multiple blood draws from the sametransplant recipient. This may improve the accuracy of the model, asmultiple genetic measurements may provide more data with which thetarget genotype may be determined. In an embodiment, one set of targetgenetic data served as the primary data which was reported, and theother served as data to double-check the primary target genetic data. Inan embodiment, a plurality of sets of genetic data, each measured fromgenetic material taken from the target individual, are considered inparallel.

In an embodiment, the raw genetic material of the transplant recipientand the transplant donor is transformed by way of amplification to anamount of DNA that is similar in sequence, but larger in quantity. Then,by way of a genotyping method, the genotypic data that is encoded bynucleic acids is transformed into genetic measurements that may bestored physically and/or electronically on a memory device, such asthose described above. Then, through the execution of the computerprogram on the computer hardware, instead of being physically encodedbits and bytes, arranged in a pattern that represents raw measurementdata, they become transformed into a pattern that represents a highconfidence determination of the transplant status of the recipient. Thedetails of this transformation will rely on the data itself and thecomputer language and hardware system used to execute the methoddescribed herein. Then, the data that is physically configured torepresent a high quality transplant status determination of therecipient is transformed into a report which may be sent to a healthcare practitioner. This transformation may be carried out using aprinter or a computer display. The report may be a printed copy, onpaper or other suitable medium, or else it may be electronic. In thecase of an electronic report, it may be transmitted, it may bephysically stored on a memory device at a location on the computeraccessible by the health care practitioner; it also may be displayed ona screen so that it may be read. In the case of a screen display, thedata may be transformed to a readable format by causing the physicaltransformation of pixels on the display device. The transformation maybe accomplished by way of physically firing electrons at aphosphorescent screen, by way of altering an electric charge thatphysically changes the transparency of a specific set of pixels on ascreen that may lie in front of a substrate that emits or absorbsphotons. This transformation may be accomplished by way of changing thenanoscale orientation of the molecules in a liquid crystal, for example,from nematic to cholesteric or smectic phase, at a specific set ofpixels. This transformation may be accomplished by way of an electriccurrent causing photons to be emitted from a specific set of pixels madefrom a plurality of light emitting diodes arranged in a meaningfulpattern. This transformation may be accomplished by any other way usedto display information, such as a computer screen, or some other outputdevice or way of transmitting information. The health care practitionermay then act on the report, such that the data in the report istransformed into an action. The action may be to continue or discontinueimmunosuppressive medication. In some embodiments, the action may be toincrease or decrease immunosuppressive medication.

In some embodiments, the methods described herein can be used at a veryearly period of time following transplantation surgery, for example asearly as the day of surgery, one day after surgery, two days aftersurgery, three days after surgery, four days after surgery, five daysafter surgery, six days after surgery, a week after surgery, two weeksafter surgery, three weeks after surgery, four weeks after surgery, onemonth after surgery, two months after surgery, three months aftersurgery, four months after surgery, five months after surgery, sixmonths after surgery, seven months after surgery, eight months aftersurgery, nine months after surgery, ten months after surgery, elevenmonths after surgery, or a year or more after surgery.

Any of the embodiments disclosed herein may be implemented in digitalelectronic circuitry, integrated circuitry, specially designed ASICs(application-specific integrated circuits), computer hardware, firmware,software, or in combinations thereof. Apparatus of the presentlydisclosed embodiments can be implemented in a computer program producttangibly embodied in a machine-readable storage device for execution bya programmable processor; and method steps of the presently disclosedembodiments can be performed by a programmable processor executing aprogram of instructions to perform functions of the presently disclosedembodiments by operating on input data and generating output. Thepresently disclosed embodiments can be implemented advantageously in oneor more computer programs that are executable and/or interpretable on aprogrammable system including at least one programmable processor, whichmay be special or general purpose, coupled to receive data andinstructions from, and to transmit data and instructions to, a storagesystem, at least one input device, and at least one output device. Eachcomputer program can be implemented in a high-level procedural orobject-oriented programming language or in assembly or machine languageif desired; and in any case, the language can be a compiled orinterpreted language. A computer program may be deployed in any form,including as a stand-alone program, or as a module, component,subroutine, or other unit suitable for use in a computing environment. Acomputer program may be deployed to be executed or interpreted on onecomputer or on multiple computers at one site, or distributed acrossmultiple sites and interconnected by a communication network.

Computer readable storage media, as used herein, refers to physical ortangible storage (as opposed to signals) and includes without limitationvolatile and non-volatile, removable and non-removable media implementedin any method or technology for the tangible storage of information suchas computer-readable instructions, data structures, program modules orother data. Computer readable storage media includes, but is not limitedto, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memorytechnology, CD-ROM, DVD, or other optical storage, magnetic cassettes,magnetic tape, magnetic disk storage or other magnetic storage devices,or any other physical or material medium which can be used to tangiblystore the desired information or data or instructions and which can beaccessed by a computer or processor.

Any of the methods described herein may include the output of data in aphysical format, such as on a computer screen, or on a paper printout.In explanations of any embodiments elsewhere in this document, it shouldbe understood that the described methods may be combined with the outputof the actionable data in a format that can be acted upon by aphysician. In addition, the described methods may be combined with theactual execution of a clinical decision that results in a clinicaltreatment, or the execution of a clinical decision to make no action.Some of the embodiments described in the document for determininggenetic data pertaining to a target individual may be combined with aclinical decision or action. Some of the embodiments described in thedocument for determining genetic data pertaining to a target individualmay be combined with the notification of a potential transplantrejection, or lack thereof, with a medical professional. Some of theembodiments described herein may be combined with the output of theactionable data, and the execution of a clinical decision that resultsin a clinical treatment, or the execution of a clinical decision to makeno action.

Targeted Enrichment and Sequencing

The use of a technique to enrich a sample of DNA at a set of target locifollowed by sequencing as part of a method for non-invasivedetermination of transplant status in a transplant recipient may confera number of unexpected advantages. In some embodiments of the presentdisclosure, the method involves measuring genetic data for use with aninformatics based method. The ultimate outcome of some of theembodiments is the actionable data of the status of a transplant. Thereare many methods that may be used to measure the genetic data of theindividual and/or the related individuals as part of embodied methods.In an embodiment, a method for enriching the concentration of a set oftargeted alleles is disclosed herein, the method comprising one or moreof the following steps: targeted amplification of genetic material,addition of loci specific oligonucleotide probes, ligation of specifiedDNA strands, isolation of sets of desired DNA, removal of unwantedcomponents of a reaction, detection of certain sequences of DNA byhybridization, and detection of the sequence of one or a plurality ofstrands of DNA by DNA sequencing methods. In some cases the DNA strandsmay refer to target genetic material, in some cases they may refer toprimers, in some cases they may refer to synthesized sequences, orcombinations thereof. These steps may be carried out in a number ofdifferent orders. Given the highly variable nature of molecular biology,it is generally not obvious which methods, and which combinations ofsteps, will perform poorly, well, or best in various situations.

For example, a universal amplification step of the DNA prior to targetedamplification may confer several advantages, such as removing the riskof bottlenecking and reducing allelic bias. The DNA may be mixed anoligonucleotide probe that can hybridize with two neighboring regions ofthe target sequence, one on either side. After hybridization, the endsof the probe may be connected by adding a polymerase, a means forligation, and any necessary reagents to allow the circularization of theprobe. After circularization, an exonuclease may be added to digest tonon-circularized genetic material, followed by detection of thecircularized probe. The DNA may be mixed with PCR primers that canhybridize with two neighboring regions of the target sequence, one oneither side. After hybridization, the ends of the probe may be connectedby adding a polymerase, a means for ligation, and any necessary reagentsto complete PCR amplification. Amplified or unamplified DNA may betargeted by hybrid capture probes that target a set of loci; afterhybridization, the probe may be localized and separated from the mixtureto provide a mixture of DNA that is enriched in target sequences.

In some embodiments the detection of the target genetic material may bedone in a multiplexed fashion. The number of genetic target sequencesthat may be run in parallel can range from one to ten, ten to onehundred, one hundred to one thousand, one thousand to ten thousand, tenthousand to one hundred thousand, one hundred thousand to one million,or one million to ten million. Note that the prior art includesdisclosures of successful multiplexed PCR reactions involving pools ofup to about 50 or 100 primers, and not more. Prior attempts to multiplexmore than 100 primers per pool have resulted in significant problemswith unwanted side reactions such as primer-dimer formation.

In some embodiments, this method may be used to genotype a single cell,a small number of cells, two to five cells, six to ten cells, ten totwenty cells, twenty to fifty cell, fifty to one hundred cells, onehundred to one thousand cells, or a small amount of extracellular DNA,for example from one to ten picograms, from ten to one hundredpictograms, from one hundred pictograms to one nanogram, from one to tennanograms, from ten to one hundred nanograms, or from one hundrednanograms to one microgram.

The use of a method to target certain loci followed by sequencing aspart of a method for transplant state calling may confer a number ofunexpected advantages. Some methods by which DNA may be targeted, orpreferentially enriched, include using circularizing probes, linkedinverted probes (LIPs, MIPs), capture by hybridization methods such asSURESELECT, and targeted PCR or ligation-mediated PCR amplificationstrategies.

There are many methods that may be used to measure the genetic data ofthe individual and/or the related individuals in the aforementionedcontexts. The different methods comprise a number of steps, those stepsoften involving amplification of genetic material, addition ofoligonucleotide probes, ligation of specified DNA strands, isolation ofsets of desired DNA, removal of unwanted components of a reaction,detection of certain sequences of DNA by hybridization, detection of thesequence of one or a plurality of strands of DNA by DNA sequencingmethods. In some cases the DNA strands may refer to target geneticmaterial, in some cases they may refer to primers, in some cases theymay refer to synthesized sequences, or combinations thereof. These stepsmay be carried out in a number of different orders. Given the highlyvariable nature of molecular biology, it is generally not obvious whichmethods, and which combinations of steps, will perform poorly, well, orbest in various situations.

Note that in theory it is possible to target any number loci in thegenome, anywhere from one loci to well over one million loci. If asample of DNA is subjected to targeting, and then sequenced, thepercentage of the alleles that are read by the sequencer will beenriched with respect to their natural abundance in the sample. Thedegree of enrichment can be anywhere from one percent (or even less) toten-fold, a hundred-fold, a thousand-fold or even many million-fold. Inthe human genome there are roughly 3 billion base pairs, andnucleotides, comprising approximately 75 million polymorphic loci. Themore loci that are targeted, the smaller the degree of enrichment ispossible. The fewer the number of loci that are targeted, the greaterdegree of enrichment is possible, and the greater depth of read may beachieved at those loci for a given number of sequence reads.

In an embodiment of the present disclosure, the targeting orpreferential may focus entirely on SNPs. In an embodiment, the targetingor preferential may focus on any polymorphic site. A number ofcommercial targeting products are available to enrich exons.Surprisingly, targeting exclusively SNPs, or exclusively polymorphicloci, is particularly advantageous. Those types of methodology that donot focus on polymorphic alleles would not benefit as much fromtargeting or preferential enrichment of a set of alleles.

In an embodiment of the present disclosure, it is possible to use atargeting method that focuses on SNPs to enrich a genetic sample inpolymorphic regions of the genome. In an embodiment, it is possible tofocus on a small number of SNPs, for example between 1 and 100 SNPs, ora larger number, for example, between 100 and 1,000, between 1,000 and10,000, between 10,000 and 100,000 or more than 100,000 SNPs. In anembodiment, it is possible to focus on one or a small number ofchromosomes that are correlated with live trisomic births, for examplechromosomes 13, 18, 21, X and Y, or some combination thereof. In anembodiment, it is possible to enrich the targeted SNPs by a smallfactor, for example between 1.01 fold and 100 fold, or by a largerfactor, for example between 100 fold and 1,000,000 fold, or even by morethan 1,000,000 fold. In an embodiment of the present disclosure, it ispossible to use a targeting method to create a sample of DNA that ispreferentially enriched in polymorphic regions of the genome. In anembodiment, it is possible to use this method to create a mixture of DNAwith any of these characteristics where the mixture of DNA containstransplant recipient DNA and also free floating donor-derive DNA. In anembodiment, it is possible to use this method to create a mixture of DNAthat has any combination of these factors. Any of the targeting methodsdescribed herein can be used to create mixtures of DNA that arepreferentially enriched in certain loci.

In some embodiments, a method of the present disclosure further includesmeasuring the DNA in the mixed fraction using a high throughput DNAsequencer, where the DNA in the mixed fraction contains adisproportionate number of sequences from one or more chromosomes.

Described herein are three methods: multiplex PCR, targeted capture byhybridization, and linked inverted probes (LIPs), which may be used toobtain and analyze measurements from a sufficient number of polymorphicloci from a transplant recipient plasma sample in order to detecttransplant rejection; this is not meant to exclude other methods ofselective enrichment of targeted loci. Other methods may equally well beused without changing the essence of the method. In each case thepolymorphism assayed may include single nucleotide polymorphisms (SNPs),small indels, or STRs. A preferred method involves the use of SNPs. Eachapproach produces allele frequency data; allele frequency data for eachtargeted locus and/or the joint allele frequency distributions fromthese loci may be analyzed to determine the rejection and/or injurystatus of the transplant. Each approach has its own considerations dueto the limited source material and the fact that transplant recipientplasma consists of mixture of recipient and donor-derived DNA. Thismethod may be combined with other approaches to provide a more accuratedetermination. In an embodiment, this method may be combined with asequence counting approach such as that described in U.S. Pat. No.7,888,017.

Accurately Measuring the Allelic Distributions in a Sample

Current sequencing approaches can be used to estimate the distributionof alleles in a sample. One such method involves randomly samplingsequences from a pool DNA, termed shotgun sequencing. The proportion ofa particular allele in the sequencing data is typically very low and canbe determined by simple statistics. The human genome containsapproximately 3 billion base pairs. So, if the sequencing method usedmake 100 bp reads, a particular allele will be measured about once inevery 30 million sequence reads.

In an embodiment, a method of the present disclosure is used todetermine the presence or absence of two or more different haplotypesthat contain the same set of loci in a sample of DNA from the measuredallele distributions of loci from that chromosome. Alleles that arepolymorphic between the haplotypes tend to be more informative, howeverany alleles where the transplant recipient and transplant donor are notboth homozygous for the same allele will yield useful informationthrough measured allele distributions beyond the information that isavailable from simple read count analysis.

Shotgun sequencing of such a sample, however, is extremely inefficientas it results in many sequences for regions that are not polymorphicbetween the different haplotypes in the sample, or are for chromosomesthat are not of interest, and therefore reveal no information about theproportion of the target haplotypes. Described herein are methods thatspecifically target and/or preferentially enrich segments of DNA in thesample that are more likely to be polymorphic in the genome to increasethe yield of allelic information obtained by sequencing. Note that forthe measured allele distributions in an enriched sample to be trulyrepresentative of the actual amounts present in the target individual,it is critical that there is little or no preferential enrichment of oneallele as compared to the other allele at a given loci in the targetedsegments. Current methods known in the art to target polymorphic allelesare designed to ensure that at least some of any alleles present aredetected. However, these methods were not designed for the purpose ofmeasuring the unbiased allelic distributions of polymorphic allelespresent in the original mixture. It is non-obvious that any particularmethod of target enrichment would be able to produce an enriched samplewherein the measured allele distributions would accurately represent theallele distributions present in the original unamplified sample betterthan any other method. While many enrichment methods may be expected, intheory, to accomplish such an aim, an ordinary person skilled in the artis well aware that there is a great deal of stochastic or deterministicbias in current amplification, targeting and other preferentialenrichment methods. One embodiment of a method described herein allows aplurality of alleles found in a mixture of DNA that correspond to agiven locus in the genome to be amplified, or preferentially enriched ina way that the degree of enrichment of each of the alleles is nearly thesame. Another way to say this is that the method allows the relativequantity of the alleles present in the mixture as a whole to beincreased, while the ratio between the alleles that correspond to eachlocus remains essentially the same as they were in the original mixtureof DNA. Methods in the prior art preferential enrichment of loci canresult in allelic biases of more than 1%, more than 2%, more than 5% andeven more than 10%. This preferential enrichment may be due to capturebias when using a capture by hybridization approach, or amplificationbias which may be small for each cycle, but can become large whencompounded over 20, 30 or 40 cycles. For the purposes of thisdisclosure, for the ratio to remain essentially the same means that theratio of the alleles in the original mixture divided by the ratio of thealleles in the resulting mixture is between 0.95 and 1.05, between 0.98and 1.02, between 0.99 and 1.01, between 0.995 and 1.005, between 0.998and 1.002, between 0.999 and 1.001, or between 0.9999 and 1.0001. Notethat the calculation of the allele ratios presented here may not be usedin the determination of the transplant status of the transplantrecipient, and may only be a metric to be used to measure allelic bias.

In an embodiment, once a mixture has been preferentially enriched at theset of target loci, it may be sequenced using any one of the previous,current, or next generation of sequencing instruments that sequences aclonal sample (a sample generated from a single molecule; examplesinclude ILLUMINA GAIIx, ILLUMINA HISEQ, LIFE TECHNOLOGIES SOLiD,5500XL). The ratios can be evaluated by sequencing through the specificalleles within the targeted region. These sequencing reads can beanalyzed and counted according the allele type and the rations ofdifferent alleles determined accordingly. For variations that are one toa few bases in length, detection of the alleles will be performed bysequencing and it is essential that the sequencing read span the allelein question in order to evaluate the allelic composition of thatcaptured molecule. The total number of captured molecules assayed forthe genotype can be increased by increasing the length of the sequencingread. Full sequencing of all molecules would guarantee collection of themaximum amount of data available in the enriched pool. However,sequencing is currently expensive, and a method that can measure alleledistributions using a lower number of sequence reads will have greatvalue. In addition, there are technical limitations to the maximumpossible length of read as well as accuracy limitations as read lengthsincrease. The alleles of greatest utility will be of one to a few basesin length, but theoretically any allele shorter than the length of thesequencing read can be used. While allele variations come in all types,the examples provided herein focus on SNPs or variants contained of justa few neighboring base pairs. Larger variants such as segmental copynumber variants can be detected by aggregations of these smallervariations in many cases as whole collections of SNP internal to thesegment are duplicated. Variants larger than a few bases, such as STRsrequire special consideration and some targeting approaches work whileothers will not.

There are multiple targeting approaches that can be used to specificallyisolate and enrich a one or a plurality of variant positions in thegenome. Typically, these rely on taking advantage of the invariantsequence flanking the variant sequence. There is prior art related totargeting in the context of sequencing where the substrate is maternalplasma (see, e.g., Liao et al., Clin. Chem. 2011; 57(1): pp. 92-101).However, the approaches in the prior art all use targeting probes thattarget exons, and do not focus on targeting polymorphic regions of thegenome. In an embodiment, a method of the present disclosure involvesusing targeting probes that focus exclusively or almost exclusively onpolymorphic regions. In an embodiment, a method of the presentdisclosure involves using targeting probes that focus exclusively oralmost exclusively on SNPs. In some embodiments of the presentdisclosure, the targeted polymorphic sites consist of at least 10% SNPs,at least 20% SNPs, at least 30% SNPs, at least 40% SNPs, at least 50%SNPs, at least 60% SNPs, at least 70% SNPs, at least 80% SNPs, at least90% SNPs, at least 95% SNPs, at least 98% SNPs, at least 99% SNPs, atleast 99.9% SNPs, or exclusively SNPs.

In an embodiment, a method of the present disclosure can be used todetermine genotypes (base composition of the DNA at specific loci) andrelative proportions of those genotypes from a mixture of DNA molecules,where those DNA molecules may have originated from one or a number ofgenetically distinct individuals. In an embodiment, a method of thepresent disclosure can be used to determine the genotypes at a set ofpolymorphic loci, and the relative ratios of the amount of differentalleles present at those loci. In an embodiment the polymorphic loci mayconsist entirely of SNPs. In an embodiment, the polymorphic loci cancomprise SNPs, single tandem repeats, and other polymorphisms. In anembodiment, a method of the present disclosure can be used to determinethe relative distributions of alleles at a set of polymorphic loci in amixture of DNA, where the mixture of DNA comprises DNA that originatesfrom a transplant recipient, and DNA that originates from a transplant.In an embodiment, the joint allele distributions can be determined on amixture of DNA isolated from blood from a transplant recipient. In anembodiment, the allele distributions at a set of loci can be used todetermine the transplant rejection and/or injury status of a transplant.

In an embodiment, the mixture of DNA molecules could be derived from DNAextracted from multiple cells of one individual. In an embodiment, theoriginal collection of cells from which the DNA is derived may comprisea mixture of diploid or haploid cells of the same or of differentgenotypes, if that individual is mosaic (germline or somatic). In anembodiment, the mixture of DNA molecules could also be derived from DNAextracted from single cells. In an embodiment, the mixture of DNAmolecules could also be derived from DNA extracted from mixture of twoor more cells of the same individual, or of different individuals. In anembodiment, the mixture of

DNA molecules could be derived from DNA isolated from biologicalmaterial that has already liberated from cells such as blood plasma,which is known to contain cell free DNA. In an embodiment, the thisbiological material may be a mixture of DNA from one or moreindividuals, as is the case during pregnancy where it has been shownthat fetal DNA is present in the mixture. In an embodiment, thebiological material could be from a mixture of cells that were found intransplant recipient blood, where some of the cells originate from thetransplant.

Circularizing Probes

Some embodiments of the present disclosure involve the use of “LinkedInverted Probes” (LIPs), which have been previously described in theliterature. LIPs is a generic term meant to encompass technologies thatinvolve the creation of a circular molecule of DNA, where the probes aredesigned to hybridize to targeted region of DNA on either side of atargeted allele, such that addition of appropriate polymerases and/orligases, and the appropriate conditions, buffers and other reagents,will complete the complementary, inverted region of DNA across thetargeted allele to create a circular loop of DNA that captures theinformation found in the targeted allele. LIPs may also be calledpre-circularized probes, pre-circularizing probes, or circularizingprobes. The LIPs probe may be a linear DNA molecule between 50 and 500nucleotides in length, and in an embodiment between 70 and 100nucleotides in length; in some embodiments, it may be longer or shorterthan described herein. Others embodiments of the present disclosureinvolve different incarnations, of the LIPs technology, such as PadlockProbes and MOLECULAR INVERSION PROBES (MIPs).

One method to target specific locations for sequencing is to synthesizeprobes in which the 3′ and 5′ ends of the probes anneal to target DNA atlocations adjacent to and on either side of the targeted region, in aninverted manner, such that the addition of DNA polymerase and DNA ligaseresults in extension from the 3′ end, adding bases to single strandedprobe that are complementary to the target molecule (gap-fill), followedby ligation of the new 3′ end to the 5′ end of the original proberesulting in a circular DNA molecule that can be subsequently isolatedfrom background DNA. The probe ends are designed to flank the targetedregion of interest. One aspect of this approach is commonly called MIPSand has been used in conjunction with array technologies to determinethe nature of the sequence filled in. One drawback to the use of MIPs inthe context of measuring allele ratios is that the hybridization,circularization and amplification steps do not happed at equal rates fordifferent alleles at the same loci. This results in measured alleleratios that are not representative of the actual allele ratios presentin the original mixture.

In an embodiment, the circularizing probes are constructed such that theregion of the probe that is designed to hybridize upstream of thetargeted polymorphic locus and the region of the probe that is designedto hybridize downstream of the targeted polymorphic locus are covalentlyconnected through a non-nucleic acid backbone. This backbone can be anybiocompatible molecule or combination of biocompatible molecules. Someexamples of possible biocompatible molecules are poly(ethylene glycol),polycarbonates, polyurethanes, polyethylenes, polypropylenes, sulfonepolymers, silicone, cellulose, fluoropolymers, acrylic compounds,styrene block copolymers, and other block copolymers.

In an embodiment of the present disclosure, this approach has beenmodified to be easily amenable to sequencing as a means of interrogatingthe filled in sequence. In order to retain the original allelicproportions of the original sample at least one key consideration mustbe taken into account. The variable positions among different alleles inthe gap-fill region must not be too close to the probe binding sites asthere can be initiation bias by the DNA polymerase resulting indifferential of the variants. Another consideration is that additionalvariations may be present in the probe binding sites that are correlatedto the variants in the gap-fill region which can result unequalamplification from different alleles. In an embodiment of the presentdisclosure, the 3′ ends and 5′ ends of the pre-circularized probe aredesigned to hybridize to bases that are one or a few positions away fromthe variant positions (polymorphic sites) of the targeted allele. Thenumber of bases between the polymorphic site (SNP or otherwise) and thebase to which the 3′ end and/or 5′ of the pre-circularized probe isdesigned to hybridize may be one base, it may be two bases, it may bethree bases, it may be four bases, it may be five bases, it may be sixbases, it may be seven to ten bases, it may be eleven to fifteen bases,or it may be sixteen to twenty bases, twenty to thirty bases, or thirtyto sixty bases. The forward and reverse primers may be designed tohybridize a different number of bases away from the polymorphic site.Circularizing probes can be generated in large numbers with current DNAsynthesis technology allowing very large numbers of probes to begenerated and potentially pooled, enabling interrogation of many locisimultaneously. It has been reported to work with more than 300,000probes. Two papers that discuss a method involving circularizing probesthat can be used to measure the genomic data of the target individualinclude: Porreca et al., Nature Methods, 2007 4(11), pp. 931-936; andalso Turner et al., Nature Methods, 2009, 6(5), pp. 315-316. The methodsdescribed in these papers may be used in combination with other methodsdescribed herein. Certain steps of the method from these two papers maybe used in combination with other steps from other methods describedherein.

In some embodiments of the methods disclosed herein, the geneticmaterial of the target individual is optionally amplified, followed byhybridization of the pre-circularized probes, performing a gap fill tofill in the bases between the two ends of the hybridized probes,ligating the two ends to form a circularized probe, and amplifying thecircularized probe, using, for example, rolling circle amplification.Once the desired target allelic genetic information is captured bycircularizing appropriately designed oligonucleic probes, such as in theLIPs system, the genetic sequence of the circularized probes may bebeing measured to give the desired sequence data. In an embodiment, theappropriately designed oligonucleotides probes may be circularizeddirectly on unamplified genetic material of the target individual, andamplified afterwards. Note that a number of amplification procedures maybe used to amplify the original genetic material, or the circularizedLIPs, including rolling circle amplification, MDA, or otheramplification protocols. Different methods may be used to measure thegenetic information on the target genome, for example using highthroughput sequencing, Sanger sequencing, other sequencing methods,capture-by-hybridization, capture-by-circularization, multiplex PCR,other hybridization methods, and combinations thereof.

Once the genetic material of the individual has been measured using oneor a combination of the above methods, an informatics based method,along with the appropriate genetic measurements, can then be used todetermination the transplant status of a transplant recipient.

Applying an informatics based method to determine the transplant statusof a transplant recipient from genetic data as measured by hybridizationarrays, such as the ILLUMINA INFINIUM array, or the AFFYMETRIX gene chiphas been described in documents references elsewhere in this document.However, the method described herein shows improvements over methodsdescribed previously in the literature. For example, the LIPs basedapproach followed by high throughput sequencing unexpectedly providesbetter genotypic data due to the approach having better capacity formultiplexing, better capture specificity, better uniformity, and lowallelic bias. Greater multiplexing allows more alleles to be targeted,giving more accurate results. Better uniformity results in more of thetargeted alleles being measured, giving more accurate results. Lowerrates of allelic bias result in lower rates of miscalls, giving moreaccurate results. More accurate results result in an improvement inclinical outcomes, and better medical care.

It is important to note that LIPs may be used as a method for targetingspecific loci in a sample of DNA for genotyping by methods other thansequencing. For example, LIPs may be used to target DNA for genotypingusing SNP arrays or other DNA or RNA based microarrays.

Ligation-Mediated PCR

Ligation-mediated PCR is method of PCR used to preferentially enrich asample of DNA by amplifying one or a plurality of loci in a mixture ofDNA, the method comprising: obtaining a set of primer pairs, where eachprimer in the pair contains a target specific sequence and a non-targetsequence, where the target specific sequence is designed to anneal to atarget region, one upstream and one downstream from the polymorphicsite, and which can be separated from the polymorphic site by 0, 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11-20, 21-30, 31-40, 41-50, 51-100, or morethan 100; polymerization of the DNA from the 3-prime end of upstreamprimer to the fill the single strand region between it and the 5-primeend of the downstream primer with nucleotides complementary to thetarget molecule; ligation of the last polymerized base of the upstreamprimer to the adjacent 5-prime base of the downstream primer; andamplification of only polymerized and ligated molecules using thenon-target sequences contained at the 5-prime end of the upstream primerand the 3-prime end of the downstream primer. Pairs of primers todistinct targets may be mixed in the same reaction. The non-targetsequences serve as universal sequences such that of all pairs of primersthat have been successfully polymerized and ligated may be amplifiedwith a single pair of amplification primers.

Capture by Hybridization

Preferential enrichment of a specific set of sequences in a targetgenome can be accomplished in a number of ways. Elsewhere in thisdocument is a description of how LIPs can be used to target a specificset of sequences, but in all of those applications, other targetingand/or preferential enrichment methods can be used equally well for thesame ends. One example of another targeting method is the capture byhybridization approach. Some examples of commercial capture byhybridization technologies include AGILENT's SURE SELECT and ILLUMINA'sTRUSEQ. In capture by hybridization, a set of oligonucleotides that iscomplimentary or mostly complimentary to the desired targeted sequencesis allowed to hybridize to a mixture of DNA, and then physicallyseparated from the mixture. Once the desired sequences have hybridizedto the targeting oligonucleotides, the effect of physically removing thetargeting oligonucleotides is to also remove the targeted sequences.Once the hybridized oligos are removed, they can be heated to abovetheir melting temperature and they can be amplified. Some ways tophysically remove the targeting oligonucleotides is by covalentlybonding the targeting oligos to a solid support, for example a magneticbead, or a chip. Another way to physically remove the targetingoligonucleotides is by covalently bonding them to a molecular moietywith a strong affinity for another molecular moiety. An example of sucha molecular pair is biotin and streptavidin, such as is used in SURESELECT. Thus that targeted sequences could be covalently attached to abiotin molecule, and after hybridization, a solid support withstreptavidin affixed can be used to pull down the biotinylatedoligonucleotides, to which are hybridized to the targeted sequences.

Hybrid capture involves hybridizing probes that are complementary to thetargets of interest to the target molecules. Hybrid capture probes wereoriginally developed to target and enrich large fractions of the genomewith relative uniformity between targets. In that application, it wasimportant that all targets be amplified with enough uniformity that allregions could be detected by sequencing, however, no regard was paid toretaining the proportion of alleles in original sample. Followingcapture, the alleles present in the sample can be determined by directsequencing of the captured molecules. These sequencing reads can beanalyzed and counted according the allele type. However, using thecurrent technology, the measured allele distributions the capturedsequences are typically not representative of the original alleledistributions.

In an embodiment, detection of the alleles is performed by sequencing.In order to capture the allele identity at the polymorphic site, it isessential that the sequencing read span the allele in question in orderto evaluate the allelic composition of that captured molecule. Since thecapture molecules are often of variable lengths upon sequencing cannotbe guaranteed to overlap the variant positions unless the entiremolecule is sequenced. However, cost considerations as well as technicallimitations as to the maximum possible length and accuracy of sequencingreads make sequencing the entire molecule unfeasible. In an embodiment,the read length can be increased from about 30 to about 50 or about 70bases can greatly increase the number of reads that overlap the variantpositions within the targeted sequences.

Another way to increase the number of reads that interrogate theposition of interest is to decrease the length of the probe, as long asit does not result in bias in the underlying enriched alleles. Thelength of the synthesized probe should be long enough such that twoprobes designed to hybridize to two different alleles found at one locuswill hybridize with near equal affinity to the various alleles in theoriginal sample. Currently, methods known in the art describe probesthat are typically longer than 120 bases. In a current embodiment, ifthe allele is one or a few bases then the capture probes may be lessthan about 110 bases, less than about 100 bases, less than about 90bases, less than about 80 bases, less than about 70 bases, less thanabout 60 bases, less than about 50 bases, less than about 40 bases, lessthan about 30 bases, and less than about 25 bases, and this issufficient to ensure equal enrichment from all alleles. When the mixtureof DNA that is to be enriched using the hybrid capture technology is amixture comprising free floating DNA isolated from blood, for examplematernal blood, the average length of DNA is quite short, typically lessthan 200 bases. The use of shorter probes results in a greater chancethat the hybrid capture probes will capture desired DNA fragments.Larger variations may require longer probes. In an embodiment, thevariations of interest are one (a SNP) to a few bases in length. In anembodiment, targeted regions in the genome can be preferentiallyenriched using hybrid capture probes wherein the hybrid capture probesare of a length below 90 bases, and can be less than 80 bases, less than70 bases, less than 60 bases, less than 50 bases, less than 40 bases,less than 30 bases, or less than 25 bases. In an embodiment, to increasethe chance that the desired allele is sequenced, the length of the probethat is designed to hybridize to the regions flanking the polymorphicallele location can be decreased from above 90 bases, to about 80 bases,or to about 70 bases, or to about 60 bases, or to about 50 bases, or toabout 40 bases, or to about 30 bases, or to about 25 bases.

There is a minimum overlap between the synthesized probe and the targetmolecule in order to enable capture. This synthesized probe can be madeas short as possible while still being larger than this minimum requiredoverlap. The effect of using a shorter probe length to target apolymorphic region is that there will be more molecules that overlap thetarget allele region. The state of fragmentation of the original DNAmolecules also affects the number of reads that will overlap thetargeted alleles. Some DNA samples such as plasma samples are alreadyfragmented due to biological processes that take place in vivo. However,samples with longer fragments by benefit from fragmentation prior tosequencing library preparation and enrichment. When both probes andfragments are short (˜60-80 bp) maximum specificity may be achievedrelatively few sequence reads failing to overlap the critical region ofinterest.

In an embodiment, the hybridization conditions can be adjusted tomaximize uniformity in the capture of different alleles present in theoriginal sample. In an embodiment, hybridization temperatures aredecreased to minimize differences in hybridization bias between alleles.Methods known in the art avoid using lower temperatures forhybridization because lowering the temperature has the effect ofincreasing hybridization of probes to unintended targets. However, whenthe goal is to preserve allele ratios with maximum fidelity, theapproach of using lower hybridization temperatures provides optimallyaccurate allele ratios, despite the fact that the current art teachesaway from this approach. Hybridization temperature can also be increasedto require greater overlap between the target and the synthesized probeso that only targets with substantial overlap of the targeted region arecaptured. In some embodiments of the present disclosure, thehybridization temperature is lowered from the normal hybridizationtemperature to about 40° C., to about 45° C., to about 50° C., to about55° C., to about 60° C., to about 65, or to about 70° C.

In an embodiment, the hybrid capture probes can be designed such thatthe region of the capture probe with DNA that is complementary to theDNA found in regions flanking the polymorphic allele is not immediatelyadjacent to the polymorphic site. Instead, the capture probe can bedesigned such that the region of the capture probe that is designed tohybridize to the DNA flanking the polymorphic site of the target isseparated from the portion of the capture probe that will be in van derWaals contact with the polymorphic site by a small distance that isequivalent in length to one or a small number of bases. In anembodiment, the hybrid capture probe is designed to hybridize to aregion that is flanking the polymorphic allele but does not cross it;this may be termed a flanking capture probe. The length of the flankingcapture probe may be less than about 120 bases, less than about 110bases, less than about 100 bases, less than about 90 bases, and can beless than about 80 bases, less than about 70 bases, less than about 60bases, less than about 50 bases, less than about 40 bases, less thanabout 30 bases, or less than about 25 bases. The region of the genomethat is targeted by the flanking capture probe may be separated by thepolymorphic locus by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11-20, or more than20 base pairs.

Description of a targeted capture based disease screening test usingtargeted sequence capture. Custom targeted sequence capture, like thosecurrently offered by AGILENT (SURE SELECT), ROCHE-NIMBLEGEN, orILLUMINA. Capture probes could be custom designed to ensure capture ofvarious types of mutations. For point mutations, one or more probes thatoverlap the point mutation should be sufficient to capture and sequencethe mutation.

For small insertions or deletions, one or more probes that overlap themutation may be sufficient to capture and sequence fragments comprisingthe mutation. Hybridization may be less efficient between theprobe-limiting capture efficiency, typically designed to the referencegenome sequence. To ensure capture of fragments comprising the mutationone could design two probes, one matching the normal allele and onematching the mutant allele. A longer probe may enhance hybridization.Multiple overlapping probes may enhance capture. Finally, placing aprobe immediately adjacent to, but not overlapping, the mutation maypermit relatively similar capture efficiency of the normal and mutantalleles.

For Simple Tandem Repeats (STRs), a probe overlapping these highlyvariable sites is unlikely to capture the fragment well. To enhancecapture a probe could be placed adjacent to, but not overlapping thevariable site. The fragment could then be sequenced as normal to revealthe length and composition of the STR.

For large deletions, a series of overlapping probes, a common approachcurrently used in exome capture systems may work. However, with thisapproach it may be difficult to determine whether or not an individualis heterozygous. Targeting and evaluating SNPs within the capturedregion could potentially reveal loss of heterozygosity across the regionindicating that an individual is a carrier. In an embodiment, it ispossible to place non-overlapping or singleton probes across thepotentially deleted region and use the number of fragments captured as ameasure of heterozygosity. In the case where an individual caries alarge deletion, one-half the number of fragments are expected to beavailable for capture relative to a non-deleted (diploid) referencelocus. Consequently, the number of reads obtained from the deletedregions should be roughly half that obtained from a normal diploidlocus. Aggregating and averaging the sequencing read depth from multiplesingleton probes across the potentially deleted region may enhance thesignal and improve confidence of the diagnosis. The two approaches,targeting SNPs to identify loss of heterozygosity and using multiplesingleton probes to obtain a quantitative measure of the quantity ofunderlying fragments from that locus can also be combined. Either orboth of these strategies may be combined with other strategies to betterobtain the same end.

There are a number of ways to decrease depth of read (DOR) variability:for example, one could increase primer concentrations, one could uselonger targeted amplification probes, or one could run more STA cycles(such as more than 25, more than 30, more than 35, or even more than 40)

Targeted PCR

In some embodiments, PCR can be used to target specific locations of thegenome. In plasma samples, the original DNA is highly fragmented(typically less than 500 bp, with an average length less than 200 bp).In PCR, both forward and reverse primers must anneal to the samefragment to enable amplification. Therefore, if the fragments are short,the PCR assays must amplify relatively short regions as well. Like MIPS,if the polymorphic positions are too close the polymerase binding site,it could result in biases in the amplification from different alleles.Currently, PCR primers that target polymorphic regions, such as thosecontaining SNPs, are typically designed such that the 3′ end of theprimer will hybridize to the base immediately adjacent to thepolymorphic base or bases. In an embodiment of the present disclosure,the 3′ ends of both the forward and reverse PCR primers are designed tohybridize to bases that are one or a few positions away from the variantpositions (polymorphic sites) of the targeted allele. The number ofbases between the polymorphic site (SNP or otherwise) and the base towhich the 3′ end of the primer is designed to hybridize may be one base,it may be two bases, it may be three bases, it may be four bases, it maybe five bases, it may be six bases, it may be seven to ten bases, it maybe eleven to fifteen bases, or it may be sixteen to twenty bases. Theforward and reverse primers may be designed to hybridize a differentnumber of bases away from the polymorphic site.

PCR assay can be generated in large numbers, however, the interactionsbetween different PCR assays makes it difficult to multiplex them beyondabout one hundred assays. Various complex molecular approaches can beused to increase the level of multiplexing, but it may still be limitedto fewer than 100, perhaps 200, or possibly 500 assays per reaction.Samples with large quantities of DNA can be split among multiplesub-reactions and then recombined before sequencing. For samples whereeither the overall sample or some subpopulation of DNA molecules islimited, splitting the sample would introduce statistical noise. In anembodiment, a small or limited quantity of DNA may refer to an amountbelow 10 pg, between 10 and 100 pg, between 100 pg and 1 ng, between 1and 10 ng, or between 10 and 100 ng. Note that while this method isparticularly useful on small amounts of DNA where other methods thatinvolve splitting into multiple pools can cause significant problemsrelated to introduced stochastic noise, this method still provides thebenefit of minimizing bias when it is run on samples of any quantity ofDNA. In these situations a universal pre-amplification step may be usedto increase the overall sample quantity. Ideally, this pre-amplificationstep should not appreciably alter the allelic distributions.

In an embodiment, a method of the present disclosure can generate PCRproducts that are specific to a large number of targeted loci,specifically 1,000 to 5,000 loci, 5,000 to 10,000 loci or more than10,000 loci, for genotyping by sequencing or some other genotypingmethod, from limited samples such as single cells or DNA from bodyfluids. Currently, performing multiplex PCR reactions of more than 5 to10 targets presents a major challenge and is often hindered by primerside products, such as primer dimers, and other artifacts. Whendetecting target sequences using microarrays with hybridization probes,primer dimers and other artifacts may be ignored, as these are notdetected. However, when using sequencing as a method of detection, thevast majority of the sequencing reads would sequence such artifacts andnot the desired target sequences in a sample. Methods described in theprior art used to multiplex more than 50 or 100 reactions in onereaction followed by sequencing will typically result in more than 20%,and often more than 50%, in many cases more than 80% and in some casesmore than 90% off-target sequence reads.

In general, to perform targeted sequencing of multiple (n) targets of asample (greater than 50, greater than 100, greater than 500, or greaterthan 1,000), one can split the sample into a number of parallelreactions that amplify one individual target. This has been performed inPCR multiwell plates or can be done in commercial platforms such as theFLUIDIGM ACCESS ARRAY (48 reactions per sample in microfluidic chips) orDROPLET PCR by RAIN DANCE TECHNOLOGY (100s to a few thousands oftargets). Unfortunately, these split-and-pool methods are problematicfor samples with a limited amount of DNA, as there is often not enoughcopies of the genome to ensure that there is one copy of each region ofthe genome in each well. This is an especially severe problem whenpolymorphic loci are targeted, and the relative proportions of thealleles at the polymorphic loci are needed, as the stochastic noiseintroduced by the splitting and pooling will cause very poorly accuratemeasurements of the proportions of the alleles that were present in theoriginal sample of DNA. Described here is a method to effectively andefficiently amplify many PCR reactions that is applicable to cases whereonly a limited amount of DNA is available. In an embodiment, the methodmay be applied for analysis of single cells, body fluids, mixtures ofDNA such as the free floating DNA found in transplant recipient plasma,biopsies, environmental and/or forensic samples.

In an embodiment, the targeted sequencing may involve one, a plurality,or all of the following steps. a) Generate and amplify a library withadaptor sequences on both ends of DNA fragments. b) Divide into multiplereactions after library amplification. c) Generate and optionallyamplify a library with adaptor sequences on both ends of DNA fragments.d) Perform 1000- to 10,000-plex amplification of selected targets usingone target specific “Forward” primer per target and one tag specificprimer. e) Perform a second amplification from this product using“Reverse” target specific primers and one (or more) primer specific to auniversal tag that was introduced as part of the target specific forwardprimers in the first round. f) Perform a 1000-plex preamplification ofselected target for a limited number of cycles. g) Divide the productinto multiple aliquots and amplify subpools of targets in individualreactions (for example, 50 to 500-plex, though this can be used all theway down to singleplex. h) Pool products of parallel subpools reactions.i) During these amplifications primers may carry sequencing compatibletags (partial or full length) such that the products can be sequenced.

Highly Multiplexed PCR

Disclosed herein are methods that permit the targeted amplification ofover a hundred to tens of thousands of target sequences (e.g. SNP loci)from genomic DNA obtained from plasma. The amplified sample may berelatively free of primer dimer products and have low allelic bias attarget loci. If during or after amplification the products are appendedwith sequencing compatible adaptors, analysis of these products can beperformed by sequencing.

Performing a highly multiplexed PCR amplification using methods known inthe art results in the generation of primer dimer products that are inexcess of the desired amplification products and not suitable forsequencing. These can be reduced empirically by eliminating primers thatform these products, or by performing in silico selection of primers.However, the larger the number of assays, the more difficult thisproblem becomes.

One solution is to split the 5000-plex reaction into severallower-plexed amplifications, e.g. one hundred 50-plex or fifty 100-plexreactions, or to use microfluidics or even to split the sample intoindividual PCR reactions. However, if the sample DNA is limited, such asin non-invasive prenatal diagnostics from pregnancy plasma, dividing thesample between multiple reactions should be avoided as this will resultin bottlenecking.

Described herein are methods to first globally amplify the plasma DNA ofa sample and then divide the sample up into multiple multiplexed targetenrichment reactions with more moderate numbers of target sequences perreaction. In an embodiment, a method of the present disclosure can beused for preferentially enriching a DNA mixture at a plurality of loci,the method comprising one or more of the following steps: generating andamplifying a library from a mixture of DNA where the molecules in thelibrary have adaptor sequences ligated on both ends of the DNAfragments, dividing the amplified library into multiple reactions,performing a first round of multiplex amplification of selected targetsusing one target specific “forward” primer per target and one or aplurality of adaptor specific universal “reverse” primers. In anembodiment, a method of the present disclosure further includesperforming a second amplification using “reverse” target specificprimers and one or a plurality of primers specific to a universal tagthat was introduced as part of the target specific forward primers inthe first round. In an embodiment, the method may involve a fullynested, hemi-nested, semi-nested, one sided fully nested, one sidedhemi-nested, or one sided semi-nested PCR approach. In an embodiment, amethod of the present disclosure is used for preferentially enriching aDNA mixture at a plurality of loci, the method comprising performing amultiplex preamplification of selected targets for a limited number ofcycles, dividing the product into multiple aliquots and amplifyingsubpools of targets in individual reactions, and pooling products ofparallel subpools reactions. Note that this approach could be used toperform targeted amplification in a manner that would result in lowlevels of allelic bias for 50-500 loci, for 500 to 5,000 loci, for 5,000to 50,000 loci, or even for 50,000 to 500,000 loci. In an embodiment,the primers carry partial or full length sequencing compatible tags.

The workflow may entail (1) extracting plasma DNA, (2) preparingfragment library with universal adaptors on both ends of fragments, (3)amplifying the library using universal primers specific to the adaptors,(4) dividing the amplified sample “library” into multiple aliquots, (5)performing multiplex (e.g. about 100-plex, 1,000, or 10,000-plex withone target specific primer per target and a tag-specific primer)amplifications on aliquots, (6) pooling aliquots of one sample, (7)barcoding the sample, (8) mixing the samples and adjusting theconcentration, (9) sequencing the sample. The workflow may comprisemultiple sub-steps that contain one of the listed steps (e.g. step (2)of preparing the library step could entail three enzymatic steps (bluntending, dA tailing and adaptor ligation) and three purification steps).Steps of the workflow may be combined, divided up or performed indifferent order (e.g. bar coding and pooling of samples).

It is important to note that the amplification of a library can beperformed in such a way that it is biased to amplify short fragmentsmore efficiently. In this manner it is possible to preferentiallyamplify shorter sequences, e.g. mono-nucleosomal DNA fragments as thecell free fetal DNA (of placental origin) found in the circulation ofpregnant women. Note that PCR assays can have the tags, for examplesequencing tags, (usually a truncated form of 15-25 bases). Aftermultiplexing, PCR multiplexes of a sample are pooled and then the tagsare completed (including bar coding) by a tag-specific PCR (could alsobe done by ligation). Also, the full sequencing tags can be added in thesame reaction as the multiplexing. In the first cycles targets may beamplified with the target specific primers, subsequently thetag-specific primers take over to complete the SQ-adaptor sequence. ThePCR primers may carry no tags. The sequencing tags may be appended tothe amplification products by ligation.

In an embodiment, highly multiplex PCR followed by evaluation ofamplified material by clonal sequencing may be used to detect transplantrejection status. Whereas traditional multiplex PCRs evaluate up tofifty loci simultaneously, the approach described herein may be used toenable simultaneous evaluation of more than 50 loci simultaneously, morethan 100 loci simultaneously, more than 500 loci simultaneously, morethan 1,000 loci simultaneously, more than 5,000 loci simultaneously,more than 10,000 loci simultaneously, more than 50,000 locisimultaneously, and more than 100,000 loci simultaneously. Experimentshave shown that up to, including and more than 10,000 distinct loci canbe evaluated simultaneously, in a single reaction, with sufficientlygood efficiency and specificity to make non-invasive transplant stautcalls with high accuracy. Assays may be combined in a single reactionwith the entirety of a cfDNA sample isolated from transplant recipientplasma, a fraction thereof, or a further processed derivative of thecfDNA sample. The cfDNA or derivative may also be split into multipleparallel multiplex reactions. The optimum sample splitting and multiplexis determined by trading off various performance specifications. Due tothe limited amount of material, splitting the sample into multiplefractions can introduce sampling noise, handling time, and increase thepossibility of error. Conversely, higher multiplexing can result ingreater amounts of spurious amplification and greater inequalities inamplification both of which can reduce test performance.

Two crucial related considerations in the application of the methodsdescribed herein are the limited amount of original plasma and thenumber of original molecules in that material from which allelefrequency or other measurements are obtained. If the number of originalmolecules falls below a certain level, random sampling noise becomessignificant, and can affect the accuracy of the test. Typically, data ofsufficient quality for making non-invasive prenatal aneuploidy diagnosescan be obtained if measurements are made on a sample comprising theequivalent of 500-1000 original molecules per target locus. There are anumber of ways of increasing the number of distinct measurements, forexample increasing the sample volume. Each manipulation applied to thesample also potentially results in losses of material. It is essentialto characterize losses incurred by various manipulations and avoid, oras necessary improve yield of certain manipulations to avoid losses thatcould degrade performance of the test.

In an embodiment, it is possible to mitigate potential losses insubsequent steps by amplifying all or a fraction of the original cfDNAsample. Various methods are available to amplify all of the geneticmaterial in a sample, increasing the amount available for downstreamprocedures. In an embodiment, ligation mediated PCR (LM-PCR) DNAfragments are amplified by PCR after ligation of either one distinctadaptors, two distinct adapters, or many distinct adaptors. In anembodiment, multiple displacement amplification (MDA) phi-29 polymeraseis used to amplify all DNA isothermally. In DOP-PCR and variations,random priming is used to amplify the original material DNA. Each methodhas certain characteristics such as uniformity of amplification acrossall represented regions of the genome, efficiency of capture andamplification of original DNA, and amplification performance as afunction of the length of the fragment.

In an embodiment LM-PCR may be used with a single heteroduplexed adaptorhaving a 3-prime tyrosine. The heteroduplexed adaptor enables the use ofa single adaptor molecule that may be converted to two distinctsequences on 5-prime and 3-prime ends of the original DNA fragmentduring the first round of PCR. In an embodiment, it is possible tofractionate the amplified library by size separations, or products suchas AMPURE, TASS or other similar methods. Prior to ligation, sample DNAmay be blunt ended, and then a single adenosine base is added to the3-prime end. Prior to ligation the DNA may be cleaved using arestriction enzyme or some other cleavage method. During ligation the3-prime adenosine of the sample fragments and the complementary 3-primetyrosine overhang of adaptor can enhance ligation efficiency. Theextension step of the PCR amplification may be limited from a timestandpoint to reduce amplification from fragments longer than about 200bp, about 300 bp, about 400 bp, about 500 bp or about 1,000 bp. Sincelonger DNA found in the transplant recipient plasma is nearlyexclusively maternal, this may result in the enrichment of fetal DNA by10-50% and improvement of test performance. A number of reactions wererun using conditions as specified by commercially available kits; theresulted in successful ligation of fewer than 10% of sample DNAmolecules. A series of optimizations of the reaction conditions for thisimproved ligation to approximately 70%.

Mini-PCR

Traditional PCR assay design results in significant losses of distinctdonor-derive nucleic acid molecules, but losses can be greatly reducedby designing very short PCR assays, termed mini-PCR assays. cfDNA inrecipient serum is highly fragmented and the fragment sizes aredistributed in approximately a Gaussian fashion with a mean of 160 bp, astandard deviation of 15 bp, a minimum size of about 100 bp, and amaximum size of about 220 bp. The distribution of fragment start and endpositions with respect to the targeted polymorphisms, while notnecessarily random, vary widely among individual targets and among alltargets collectively and the polymorphic site of one particular targetlocus may occupy any position from the start to the end among thevarious fragments originating from that locus. Note that the termmini-PCR may equally well refer to normal PCR with no additionalrestrictions or limitations.

During PCR, amplification will only occur from template DNA fragmentscomprising both forward and reverse primer sites. Because donor derivedcfDNA fragments are short, the likelihood of both primer sites beingpresent the likelihood of a fetal fragment of length L comprising boththe forward and reverse primers sites is ratio of the length of theamplicon to the length of the fragment. Under ideal conditions, assaysin which the amplicon is 45, 50, 55, 60, 65, or 70 bp will successfullyamplify from 72%, 69%, 66%, 63%, 59%, or 56%, respectively, of availabletemplate fragment molecules. The amplicon length is the distance betweenthe 5-prime ends of the forward and reverse priming sites. Ampliconlength that is shorter than typically used by those known in the art mayresult in more efficient measurements of the desired polymorphic loci byonly requiring short sequence reads. In an embodiment, a substantialfraction of the amplicons should be less than 100 bp, less than 90 bp,less than 80 bp, less than 70 bp, less than 65 bp, less than 60 bp, lessthan 55 bp, less than 50 bp, or less than 45 bp. In some embodiments,the amplicons are between 50 to 100 bp in length, or between 60 and 80bp in length. In some embodiments, the amplicons are about 65 bp inlength.

Note that in methods known in the prior art, short assays such as thosedescribed herein are usually avoided because they are not required andthey impose considerable constraint on primer design by limiting primerlength, annealing characteristics, and the distance between the forwardand reverse primer.

Also note that there is the potential for biased amplification if the3-prime end of the either primer is within roughly 1-6 bases of thepolymorphic site. This single base difference at the site of initialpolymerase binding can result in preferential amplification of oneallele, which can alter observed allele frequencies and degradeperformance. All of these constraints make it very challenging toidentify primers that will amplify a particular locus successfully andfurthermore, to design large sets of primers that are compatible in thesame multiplex reaction. In an embodiment, the 3′ end of the innerforward and reverse primers are designed to hybridize to a region of DNAupstream from the polymorphic site, and separated from the polymorphicsite by a small number of bases. Ideally, the number of bases may bebetween 6 and 10 bases, but may equally well be between 4 and 15 bases,between three and 20 bases, between two and 30 bases, or between 1 and60 bases, and achieve substantially the same end.

Multiplex PCR may involve a single round of PCR in which all targets areamplified or it may involve one round of PCR followed by one or morerounds of nested PCR or some variant of nested PCR. Nested PCR consistsof a subsequent round or rounds of PCR amplification using one or morenew primers that bind internally, by at least one base pair, to theprimers used in a previous round. Nested PCR reduces the number ofspurious amplification targets by amplifying, in subsequent reactions,only those amplification products from the previous one that have thecorrect internal sequence. Reducing spurious amplification targetsimproves the number of useful measurements that can be obtained,especially in sequencing. Nested PCR typically entails designing primerscompletely internal to the previous primer binding sites, necessarilyincreasing the minimum DNA segment size required for amplification. Forsamples such as transplant recipient plasma cfDNA, in which the DNA ishighly fragmented, the larger assay size reduces the number of distinctcfDNA molecules from which a measurement can be obtained. In anembodiment, to offset this effect, one may use a partial nestingapproach where one or both of the second round primers overlap the firstbinding sites extending internally some number of bases to achieveadditional specificity while minimally increasing in the total assaysize.

In an embodiment, a multiplex pool of PCR assays are designed to amplifypotentially heterozygous SNP or other polymorphic or non-polymorphicloci on one or more chromosomes and these assays are used in a singlereaction to amplify DNA. The number of PCR assays may be between 50 and200 PCR assays, between 200 and 1,000 PCR assays, between 1,000 and5,000 PCR assays, or between 5,000 and 20,000 PCR assays (50 to200-plex, 200 to 1,000-plex, 1,000 to 5,000-plex, 5,000 to 20,000-plex,more than 20,000-plex respectively). In an embodiment, a multiplex poolof about 10,000 PCR assays (10,000-plex) are designed to amplifypotentially heterozygous SNP loci on chromosomes X, Y, 13, 18, and 21and 1 or 2 and these assays are used in a single reaction to amplifycfDNA obtained from a material plasma sample, chorion villus samples,amniocentesis samples, single or a small number of cells, other bodilyfluids or tissues, cancers, or other genetic matter. The SNP frequenciesof each locus may be determined by clonal or some other method ofsequencing of the amplicons. Statistical analysis of the allelefrequency distributions or ratios of all assays may be used to determineif the sample contains a trisomy of one or more of the chromosomesincluded in the test. In another embodiment the original cfDNA samplesis split into two samples and parallel 5,000-plex assays are performed.In another embodiment the original cfDNA samples is split into n samplesand parallel (10,000/n)-plex assays are performed where n is between 2and 12, or between 12 and 24, or between 24 and 48, or between 48 and96. Data is collected and analyzed in a similar manner to that alreadydescribed. Note that this method is equally well applicable to detectingtranslocations, deletions, duplications, and other chromosomalabnormalities.

In an embodiment, tails with no homology to the target genome may alsobe added to the 3-prime or 5-prime end of any of the primers. Thesetails facilitate subsequent manipulations, procedures, or measurements.In an embodiment, the tail sequence can be the same for the forward andreverse target specific primers. In an embodiment, different tails mayused for the forward and reverse target specific primers. In anembodiment, a plurality of different tails may be used for differentloci or sets of loci. Certain tails may be shared among all loci oramong subsets of loci. For example, using forward and reverse tailscorresponding to forward and reverse sequences required by any of thecurrent sequencing platforms can enable direct sequencing followingamplification. In an embodiment, the tails can be used as common primingsites among all amplified targets that can be used to add other usefulsequences. In some embodiments, the inner primers may contain a regionthat is designed to hybridize either upstream or downstream of thetargeted polymorphic locus. In some embodiments, the primers may containa molecular barcode. In some embodiments, the primer may contain auniversal priming sequence designed to allow PCR amplification.

In an embodiment, a 10,000-plex PCR assay pool is created such thatforward and reverse primers have tails corresponding to the requiredforward and reverse sequences required by a high throughput sequencinginstrument such as the HISEQ, GAIIX, or MYSEQ available from ILLUMINA.In addition, included 5-prime to the sequencing tails is an additionalsequence that can be used as a priming site in a subsequent PCR to addnucleotide barcode sequences to the amplicons, enabling multiplexsequencing of multiple samples in a single lane of the high throughputsequencing instrument.

In an embodiment, a 10,000-plex PCR assay pool is created such thatreverse primers have tails corresponding to the required reversesequences required by a high throughput sequencing instrument. Afteramplification with the first 10,000-plex assay, a subsequent PCRamplification may be performed using a another 10,000-plex pool havingpartly nested forward primers (e.g. 6-bases nested) for all targets anda reverse primer corresponding to the reverse sequencing tail includedin the first round. This subsequent round of partly nested amplificationwith just one target specific primer and a universal primer limits therequired size of the assay, reducing sampling noise, but greatly reducesthe number of spurious amplicons. The sequencing tags can be added toappended ligation adaptors and/or as part of PCR probes, such that thetag is part of the final amplicon.

The mini-PCR method described in this disclosure enables highlymultiplexed amplification and analysis of hundreds to thousands or evenmillions of loci in a single reaction, from a single sample. At thesame, the detection of the amplified DNA can be multiplexed; tens tohundreds of samples can be multiplexed in one sequencing lane by usingbarcoding PCR. This multiplexed detection has been successfully testedup to 49-plex, and a much higher degree of multiplexing is possible. Ineffect, this allows hundreds of samples to be genotyped at thousands ofSNPs in a single sequencing run. For these samples, the method allowsdetermination of genotype and heterozygosity rate. This method may beused for any amount of DNA or RNA, and the targeted regions may be SNPs,other polymorphic regions, non-polymorphic regions, and combinationsthereof.

In some embodiments, ligation mediated universal-PCR amplification offragmented DNA may be used. The ligation mediated universal-PCRamplification can be used to amplify plasma DNA, which can then bedivided into multiple parallel reactions. It may also be used topreferentially amplify short fragments, thereby enriching fetalfraction. In some embodiments the addition of tags to the fragments byligation can enable detection of shorter fragments, use of shortertarget sequence specific portions of the primers and/or annealing athigher temperatures which reduces unspecific reactions.

The methods described herein may be used for a number of purposes wherethere is a target set of DNA that is mixed with an amount ofcontaminating DNA. In some embodiments, the target and contaminating DNAmay be from the same individual, but where the target and contaminatingDNA are different by one or more mutations, for example in the case ofcancer. (see e.g. H. Mamon et al. Preferential Amplification ofApoptotic DNA from Plasma: Potential for Enhancing Detection of MinorDNA Alterations in Circulating DNA. Clinical Chemistry 54:9 (2008). Insome embodiments, the DNA may be found in cell culture (apoptotic)supernatant. In some embodiments, it is possible to induce apoptosis inbiological samples (e.g. blood) for subsequent library preparation,amplification and/or sequencing. A number of enabling workflows andprotocols to achieve this end are presented elsewhere in thisdisclosure.

In some embodiments, the target DNA may originate from single cells,from samples of DNA consisting of less than one copy of the targetgenome, from low amounts of DNA, from DNA from mixed origin, from otherbody fluids, from cell cultures, from culture supernatants, fromforensic samples of DNA, from ancient samples of DNA (e.g. insectstrapped in amber), from other samples of DNA, and combinations thereof.

In some embodiments, a short amplicon size may be used. Short ampliconsizes are especially suited for fragmented DNA (see e.g. A. Sikora, etsl. Detection of increased amounts of cell-free fetal DNA with short PCRamplicons. Clin Chem. 2010 January; 56(1):136-8.)

The use of short amplicon sizes may result in some significant benefits.Short amplicon sizes may result in optimized amplification efficiency.Short amplicon sizes typically produce shorter products, therefore thereis less chance for nonspecific priming. Shorter products can beclustered more densely on sequencing flow cell, as the clusters will besmaller. In an embodiment, a substantial fraction of the ampliconsshould be less than 100 bp, less than 90 bp, less than 80 bp, less than70 bp, less than 65 bp, less than 60 bp, less than 55 bp, less than 50bp, or less than 45 bp. In some embodiments, the amplicons are between50 to 100 bp in length, or between 60 and 80 bp in length. In someembodiments, the amplicons are about 65 bp in length.

Note that the methods described herein may work equally well for longerPCR amplicons. Amplicon length may be increased if necessary, forexample, when sequencing larger sequence stretches. Experiments with146-plex targeted amplification with assays of 100 bp to 200 bp lengthas first step in a nested-PCR protocol were run on single cells and ongenomic DNA with positive results.

In some embodiments, the methods described herein may be used to amplifyand/or detect SNPs, copy number, nucleotide methylation, mRNA levels,other types of RNA expression levels, other genetic and/or epigeneticfeatures. The mini-PCR methods described herein may be used along withnext-generation sequencing; it may be used with other downstream methodssuch as microarrays, counting by digital PCR, real-time PCR,Mass-spectrometry analysis etc.

In some embodiment, the mini-PCR amplification methods described hereinmay be used as part of a method for accurate quantification of minoritypopulations. It may be used for absolute quantification using spikecalibrators. It may be used for mutation/minor allele quantificationthrough very deep sequencing, and may be run in a highly multiplexedfashion. It may be used for standard paternity and identity testing ofrelatives or ancestors, in human, animals, plants or other creatures. Itmay be used for forensic testing. It may be used for rapid genotypingand copy number analysis (CN), on any kind of material, e.g. amnioticfluid and CVS, sperm, product of conception (POC). It may be used forsingle cell analysis, such as genotyping on samples biopsied fromembryos. It may be used for rapid embryo analysis (within less than one,one, or two days of biopsy) by targeted sequencing using min-PCR.

In some embodiments, it may be used for tumor analysis: tumor biopsiesare often a mixture of health and tumor cells. Targeted PCR allows deepsequencing of SNPs and loci with close to no background sequences. Itmay be used for copy number and loss of heterozygosity analysis on tumorDNA. Said tumor DNA may be present in many different body fluids ortissues of tumor patients. It may be used for detection of tumorrecurrence, and/or tumor screening. It may be used for quality controltesting of seeds. It may be used for breeding, or fishing purposes. Notethat any of these methods could equally well be used targetingnon-polymorphic loci for the purpose of ploidy calling.

Some literature describing some of the fundamental methods that underliethe methods disclosed herein include: (1) Wang H Y, Luo M, TereshchenkoI V, Frikker D M, Cui X, Li J Y, Hu G, Chu Y, Azaro M A, Lin Y, Shen L,Yang Q, Kambouris M E, Gao R, Shih W, Li H. Genome Res. 2005 February;15(2):276-83. Department of Molecular Genetics, Microbiology andImmunology/The Cancer Institute of New Jersey, Robert Wood JohnsonMedical School, New Brunswick, N.J. 08903, USA. (2) High-throughputgenotyping of single nucleotide polymorphisms with high sensitivity. LiH, Wang H Y, Cui X, Luo M, Hu G, Greenawalt D M, Tereshchenko I V, Li JY, Chu Y, Gao R. Methods Mol Biol. 2007; 396—PubMed PMID: 18025699. (3)A method comprising multiplexing of an average of 9 assays forsequencing is described in: Nested Patch PCR enables highly multiplexedmutation discovery in candidate genes. Varley K E, Mitra R D. GenomeRes. 2008 November; 18(11):1844-50. Epub 2008 Oct. 10. Note that themethods disclosed herein allow multiplexing of orders of magnitude morethan in the above references.

Primer Design

Highly multiplexed PCR can often result in the production of a very highproportion of product DNA that results from unproductive side reactionssuch as primer dimer formation. In an embodiment, the particular primersthat are most likely to cause unproductive side reactions may be removedfrom the primer library to give a primer library that will result in agreater proportion of amplified DNA that maps to the genome. The step ofremoving problematic primers, that is, those primers that areparticularly likely to firm dimers has unexpectedly enabled extremelyhigh PCR multiplexing levels for subsequent analysis by sequencing. Insystems such as sequencing, where performance significantly degrades byprimer dimers and/or other mischief products, greater than 10, greaterthan 50, and greater than 100 times higher multiplexing than otherdescribed multiplexing has been achieved. Note this is opposed to probebased detection methods, e.g. microarrays, TAQMAN, PCR etc. where anexcess of primer dimers will not affect the outcome appreciably. Alsonote that the general belief in the art is that multiplexing PCR forsequencing is limited to about 100 assays in the same well. E.g.Fluidigm and Rain Dance offer platforms to perform 48 or 1000s of PCRassays in parallel reactions for one sample.

There are a number of ways to choose primers for a library where theamount of non-mapping primer-dimer or other primer mischief products areminimized. Empirical data indicate that a small number of ‘bad’ primersare responsible for a large amount of non-mapping primer dimer sidereactions. Removing these ‘bad’ primers can increase the percent ofsequence reads that map to targeted loci. One way to identify the ‘bad’primers is to look at the sequencing data of DNA that was amplified bytargeted amplification; those primer dimers that are seen with greatestfrequency can be removed to give a primer library that is significantlyless likely to result in side product DNA that does not map to thegenome. There are also publicly available programs that can calculatethe binding energy of various primer combinations, and removing thosewith the highest binding energy will also give a primer library that issignificantly less likely to result in side product DNA that does notmap to the genome.

Multiplexing large numbers of primers imposes considerable constraint onthe assays that can be included. Assays that unintentionally interactresult in spurious amplification products. The size constraints ofminiPCR may result in further constraints. In an embodiment, it ispossible to begin with a very large number of potential SNP targets(between about 500 to greater than 1 million) and attempt to designprimers to amplify each SNP. Where primers can be designed it ispossible to attempt to identify primer pairs likely to form spuriousproducts by evaluating the likelihood of spurious primer duplexformation between all possible pairs of primers using publishedthermodynamic parameters for DNA duplex formation. Primer interactionsmay be ranked by a scoring function related to the interaction andprimers with the worst interaction scores are eliminated until thenumber of primers desired is met. In cases where SNPs likely to beheterozygous are most useful, it is possible to also rank the list ofassays and select the most heterozygous compatible assays. Experimentshave validated that primers with high interaction scores are most likelyto form primer dimers. At high multiplexing it is not possible toeliminate all spurious interactions, but it is essential to remove theprimers or pairs of primers with the highest interaction scores insilico as they can dominate an entire reaction, greatly limitingamplification from intended targets. We have performed this procedure tocreate multiplex primer sets of up 10,000 primers. The improvement dueto this procedure is substantial, enabling amplification of more than80%, more than 90%, more than 95%, more than 98%, and even more than 99%on target products as determined by sequencing of all PCR products, ascompared to 10% from a reaction in which the worst primers were notremoved. When combined with a partial semi-nested approach as previouslydescribed, more than 90%, and even more than 95% of amplicons may map tothe targeted sequences.

Note that there are other methods for determining which PCR probes arelikely to form dimers. In an embodiment, analysis of a pool of DNA thathas been amplified using a non-optimized set of primers may besufficient to determine problematic primers. For example, analysis maybe done using sequencing, and those dimers which are present in thegreatest number are determined to be those most likely to form dimers,and may be removed.

This method has a number of potential application, for example to SNPgenotyping, heterozygosity rate determination, copy number measurement,and other targeted sequencing applications. In an embodiment, the methodof primer design may be used in combination with the mini-PCR methoddescribed elsewhere in this document. In some embodiments, the primerdesign method may be used as part of a massive multiplexed PCR method.

The use of tags on the primers may reduce amplification and sequencingof primer dimer products. Tag-primers can be used to shorten necessarytarget-specific sequence to below 20, below 15, below 12, and even below10 base pairs. This can be serendipitous with standard primer designwhen the target sequence is fragmented within the primer binding siteor, or it can be designed into the primer design. Advantages of thismethod include: it increases the number of assays that can be designedfor a certain maximal amplicon length, and it shortens the“non-informative” sequencing of primer sequence. It may also be used incombination with internal tagging (see elsewhere in this document).

In an embodiment, the relative amount of nonproductive products in themultiplexed targeted PCR amplification can be reduced by raising theannealing temperature. In cases where one is amplifying libraries withthe same tag as the target specific primers, the annealing temperaturecan be increased in comparison to the genomic DNA as the tags willcontribute to the primer binding. In some embodiments we are usingconsiderably lower primer concentrations than previously reported alongwith using longer annealing times than reported elsewhere. In someembodiments the annealing times may be longer than 10 minutes, longerthan 20 minutes, longer than 30 minutes, longer than 60 minutes, longerthan 120 minutes, longer than 240 minutes, longer than 480 minutes, andeven longer than 960 minutes. In an embodiment, longer annealing timesare used than in previous reports, allowing lower primer concentrations.In some embodiments, the primer concentrations are as low as 50 nM, 20nM, 10 nM, 5 nM, 1 nM, and lower than 1 uM. This surprisingly results inrobust performance for highly multiplexed reactions, for example1,000-plex reactions, 2,000-plex reactions, 5,000-plex reactions,10,000-plex reactions, 20,000-plex reactions, 50,000-plex reactions, andeven 100,000-plex reactions. In an embodiment, the amplification usesone, two, three, four or five cycles run with long annealing times,followed by PCR cycles with more usual annealing times with taggedprimers.

To select target locations, one may start with a pool of candidateprimer pair designs and create a thermodynamic model of potentiallyadverse interactions between primer pairs, and then use the model toeliminate designs that are incompatible with other the designs in thepool.

Targeted PCR Variants—Nesting

There are many workflows that are possible when conducting PCR; someworkflows typical to the methods disclosed herein are described. Thesteps outlined herein are not meant to exclude other possible steps nordoes it imply that any of the steps described herein are required forthe method to work properly. A large number of parameter variations orother modifications are known in the literature, and may be made withoutaffecting the essence of the invention. One particular generalizedworkflow is given below followed by a number of possible variants. Thevariants typically refer to possible secondary PCR reactions, forexample different types of nesting that may be done (step 3). It isimportant to note that variants may be done at different times, or indifferent orders than explicitly described herein.

1. The DNA in the sample may have ligation adapters, often referred toas library tags or ligation adaptor tags (LTs), appended, where theligation adapters contain a universal priming sequence, followed by auniversal amplification. In an embodiment, this may be done using astandard protocol designed to create sequencing libraries afterfragmentation. In an embodiment, the DNA sample can be blunt ended, andthen an A can be added at the 3′ end. A Y-adaptor with a T-overhang canbe added and ligated. In some embodiments, other sticky ends can be usedother than an A or T overhang. In some embodiments, other adaptors canbe added, for example looped ligation adaptors. In some embodiments, theadaptors may have tag designed for PCR amplification.2. Specific Target Amplification (STA): Pre-amplification of hundreds tothousands to tens of thousands and even hundreds of thousands of targetsmay be multiplexed in one reaction. STA is typically run from 10 to 30cycles, though it may be run from 5 to 40 cycles, from 2 to 50 cycles,and even from 1 to 100 cycles. Primers may be tailed, for example for asimpler workflow or to avoid sequencing of a large proportion of dimers.Note that typically, dimers of both primers carrying the same tag willnot be amplified or sequenced efficiently. In some embodiments, between1 and 10 cycles of PCR may be carried out; in some embodiments between10 and 20 cycles of PCR may be carried out; in some embodiments between20 and 30 cycles of PCR may be carried out; in some embodiments between30 and 40 cycles of PCR may be carried out; in some embodiments morethan 40 cycles of PCR may be carried out. The amplification may be alinear amplification. The number of PCR cycles may be optimized toresult in an optimal depth of read (DOR) profile. Different DOR profilesmay be desirable for different purposes. In some embodiments, a moreeven distribution of reads between all assays is desirable; if the DORis too small for some assays, the stochastic noise can be too high forthe data to be too useful, while if the depth of read is too high, themarginal usefulness of each additional read is relatively small.

Primer tails may improve the detection of fragmented DNA fromuniversally tagged libraries. If the library tag and the primer-tailscontain a homologous sequence, hybridization can be improved (forexample, melting temperature (T_(M)) is lowered) and primers can beextended if only a portion of the primer target sequence is in thesample DNA fragment. In some embodiments, 13 or more target specificbase pairs may be used. In some embodiments, 10 to 12 target specificbase pairs may be used. In some embodiments, 8 to 9 target specific basepairs may be used. In some embodiments, 6 to 7 target specific basepairs may be used. In some embodiments, STA may be performed onpre-amplified DNA, e.g. MDA, RCA, other whole genome amplifications, oradaptor-mediated universal PCR. In some embodiments, STA may beperformed on samples that are enriched or depleted of certain sequencesand populations, e.g. by size selection, target capture, directeddegradation.

3. In some embodiments, it is possible to perform secondary multiplexPCRs or primer extension reactions to increase specificity and reduceundesirable products. For example, full nesting, semi-nesting,hemi-nesting, and/or subdividing into parallel reactions of smallerassay pools are all techniques that may be used to increase specificity.Experiments have shown that splitting a sample into three 400-plexreactions resulted in product DNA with greater specificity than one1,200-plex reaction with exactly the same primers. Similarly,experiments have shown that splitting a sample into four 2,400-plexreactions resulted in product DNA with greater specificity than one9,600-plex reaction with exactly the same primers. In an embodiment, itis possible to use target-specific and tag specific primers of the sameand opposing directionality.4. In some embodiments, it is possible to amplify a DNA sample(dilution, purified or otherwise) produced by an STA reaction usingtag-specific primers and “universal amplification”, i.e. to amplify manyor all pre-amplified and tagged targets. Primers may contain additionalfunctional sequences, e.g. barcodes, or a full adaptor sequencenecessary for sequencing on a high throughput sequencing platform.

These methods may be used for analysis of any sample of DNA, and areespecially useful when the sample of DNA is particularly small, or whenit is a sample of DNA where the DNA originates from more than oneindividual, such as in the case of transplant recipient plasma. Thesemethods may be used on DNA samples such as a single or small number ofcells, genomic DNA, plasma DNA, amplified plasma libraries, amplifiedapoptotic supernatant libraries, or other samples of mixed DNA. In anembodiment, these methods may be used in the case where cells ofdifferent genetic constitution may be present in a single individual,such as with cancer or transplants.

Protocol Variants (Variants and/or Additions to the Workflow Above)

Direct multiplexed mini-PCR: In some embodiments, specific targetamplification (STA) of a plurality of target sequences with taggedprimers is performed. In some embodiments, STA may be done on more than100, more than 200, more than 500, more than 1,000, more than 2,000,more than 5,000, more than 10,000, more than 20,000, more than 50,000,more than 100,000 or more than 200,000 targets. In a subsequentreaction, tag-specific primers amplify all target sequences and lengthenthe tags to include all necessary sequences for sequencing, includingsample indexes. In an embodiment, primers may not be tagged or onlycertain primers may be tagged. Sequencing adaptors may be added byconventional adaptor ligation. In an embodiment, the initial primers maycarry the tags.

In an embodiment, primers are designed so that the length of DNAamplified is unexpectedly short. Prior art demonstrates that ordinarypeople skilled in the art typically design 100+bp amplicons. In anembodiment, the amplicons may be designed to be less than 80 bp. In anembodiment, the amplicons may be designed to be less than 70 bp. In anembodiment, the amplicons may be designed to be less than 60 bp. In anembodiment, the amplicons may be designed to be less than 50 bp. In anembodiment, the amplicons may be designed to be less than 45 bp. In anembodiment, the amplicons may be designed to be less than 40 bp. In anembodiment, the amplicons may be designed to be less than 35 bp. In anembodiment, the amplicons may be designed to be between 40 and 65 bp.

Sequential PCR:

After STA1 multiple aliquots of the product may be amplified in parallelwith pools of reduced complexity with the same primers. The firstamplification can give enough material to split. This method isespecially good for small samples, for example those that are about6-100 pg, about 100 pg to 1 ng, about 1 ng to 10 ng, or about 10 ng to100 ng. The protocol was performed with 1200-plex into three 400-plexes.Mapping of sequencing reads increased from around 60 to 70% in the1200-plex alone to over 95%.

Semi-Nested Mini-PCR:

In some embodiments, after STA 1 a second STA is performed comprising amultiplex set of internal nested Forward primers and one (or few)tag-specific Reverse primers. With this workflow usually greater than95% of sequences map to the intended targets. The nested primer mayoverlap with the outer Forward primer sequence but introduces additional3′-end bases. In some embodiments it is possible to use between one and20 extra 3′ bases. Experiments have shown that using 9 or more extra 3′bases in a 1200-plex designs works well.

Fully Nested Mini-PCR:

After STA step 1, it is possible to perform a second multiplex PCR (orparallel m.p. PCRs of reduced complexity) with two nested primerscarrying tags (A, a, B, b). In some embodiments, it is possible to usetwo full sets of primers. Experiments using a fully nested mini-PCRprotocol were used to perform 146-plex amplification on single and threecells without the step of appending universal ligation adaptors andamplifying.

Hemi-Nested Mini-PCR:

It is possible to use target DNA that has and adaptors at the fragmentends. STA is performed comprising a multiplex set of Forward primers (B)and one (or few) tag-specific Reverse primers (A). A second STA can beperformed using a universal tag-specific Forward primer and targetspecific Reverse primer. In this workflow, target specific Forward andReverse primers are used in separate reactions, thereby reducing thecomplexity of the reaction and preventing dimer formation of forward andreverse primers. Note that in this example, primers A and B may beconsidered to be first primers, and primers ‘a’ and ‘b’ may beconsidered to be inner primers. This method is a big improvement ondirect PCR as it is as good as direct PCR, but it avoids primer dimers.After first round of hemi nested protocol one typically sees ˜99%non-targeted DNA, however, after second round there is typically a bigimprovement.

Triply Hemi-Nested Mini-PCR:

It is possible to use target DNA that has and adaptor at the fragmentends. STA is performed comprising a multiplex set of Forward primers (B)and one (or few) tag-specific Reverse primers (A) and (a). A second STAcan be performed using a universal tag-specific Forward primer andtarget specific Reverse primer. Note that in this example, primers ‘a’and B may be considered to be inner primers, and A may be considered tobe a first primer. Optionally, both A and B may be considered to befirst primers, and ‘a’ may be considered to be an inner primer. Thedesignation of reverse and forward primers may be switched. In thisworkflow, target specific Forward and Reverse primers are used inseparate reactions, thereby reducing the complexity of the reaction andpreventing dimer formation of forward and reverse primers. This methodis a big improvement on direct PCR as it is as good as direct PCR, butit avoids primer dimers. After first round of hemi nested protocol onetypically sees ˜99% non-targeted DNA, however, after second round thereis typically a big improvement.

One-Sided Nested Mini-PCR:

It is possible to use target DNA that has an adaptor at the fragmentends. STA may also be performed with a multiplex set of nested Forwardprimers and using the ligation adapter tag as the Reverse primer. Asecond STA may then be performed using a set of nested Forward primersand a universal Reverse primer. This method can detect shorter targetsequences than standard PCR by using overlapping primers in the firstand second STAs. The method is typically performed off a sample of DNAthat has already undergone STA step 1 above—appending of universal tagsand amplification; the two nested primers are only on one side, otherside uses the library tag. The method was performed on libraries ofapoptotic supernatants and pregnancy plasma. With this workflow around60% of sequences mapped to the intended targets. Note that reads thatcontained the reverse adaptor sequence were not mapped, so this numberis expected to be higher if those reads that contain the reverse adaptorsequence are mapped

One-Sided Mini-PCR:

It is possible to use target DNA that has an adaptor at the fragmentends. STA may be performed with a multiplex set of Forward primers andone (or few) tag-specific Reverse primer. This method can detect shortertarget sequences than standard PCR. However it may be relativelyunspecific, as only one target specific primer is used. This protocol iseffectively half of the one sided nested mini PCR

Reverse Semi-Nested Mini-PCR:

It is possible to use target DNA that has an adaptor at the fragmentends. STA may be performed with a multiplex set of Forward primers andone (or few) tag-specific Reverse primer. This method can detect shortertarget sequences than standard PCR.

There also may be more variants that are simply iterations orcombinations of the above methods such as doubly nested PCR, where threesets of primers are used. Another variant is one-and-a-half sided nestedmini-PCR, where STA may also be performed with a multiplex set of nestedForward primers and one (or few) tag-specific Reverse primer.

Note that in all of these variants, the identity of the Forward primerand the Reverse primer may be interchanged. Note that in someembodiments, the nested variant can equally well be run without theinitial library preparation that comprises appending the adapter tags,and a universal amplification step. Note that in some embodiments,additional rounds of PCR may be included, with additional Forward and/orReverse primers and amplification steps; these additional steps may beparticularly useful if it is desirable to further increase the percentof DNA molecules that correspond to the targeted loci.

Looped Ligation Adaptors

When adding universal tagged adaptors for example for the purpose ofmaking a library for sequencing, there are a number of ways to ligateadaptors. One way is to blunt end the sample DNA, perform A-tailing, andligate with adaptors that have a T-overhang. There are a number of otherways to ligate adaptors. There are also a number of adaptors that can beligated. For example, a Y-adaptor can be used where the adaptor consistsof two strands of DNA where one strand has a double strand region, and aregion specified by a forward primer region, and where the other strandspecified by a double strand region that is complementary to the doublestrand region on the first strand, and a region with a reverse primer.The double stranded region, when annealed, may contain a T-overhang forthe purpose of ligating to double stranded DNA with an A overhang.

In an embodiment, the adaptor can be a loop of DNA where the terminalregions are complementary, and where the loop region contains a forwardprimer tagged region (LFT), a reverse primer tagged region (LRT), and acleavage site between the two. LFT refers to the ligation adaptorForward tag, and the LRT refers to the ligation adaptor Reverse tag. Thecomplementary region may end on a T overhang, or other feature that maybe used for ligation to the target DNA. The cleavage site may be aseries of uracils for cleavage by UNG, or a sequence that may berecognized and cleaved by a restriction enzyme or other method ofcleavage or just a basic amplification. These adaptors can be uses forany library preparation, for example, for sequencing. These adaptors canbe used in combination with any of the other methods described herein,for example the mini-PCR amplification methods.

Internally Tagged Primers

When using sequencing to determine the allele present at a givenpolymorphic locus, the sequence read typically begins upstream of theprimer binding site (a), and then to the polymorphic site (X). In orderto avoid nonspecific hybridization, the primer binding site (region oftarget DNA complementary to ‘a’) is typically 18 to 30 bp in length.Sequence tag ‘b’ is typically about 20 bp; in theory these can be anylength longer than about 15 bp, though many people use the primersequences that are sold by the sequencing platform company. The distance‘d’ between ‘a’ and ‘X’ may be at least 2 bp so as to avoid allele bias.When performing multiplexed PCR amplification using the methodsdisclosed herein or other methods, where careful primer design isnecessary to avoid excessive primer primer interaction, the window ofallowable distance ‘d’ between ‘a’ and ‘X’ may vary quite a bit: from 2bp to 10 bp, from 2 bp to 20 bp, from 2 bp to 30 bp, or even from 2 bpto more than 30 bp. Therefore, when using certain primer configurations,sequence reads must be a minimum length to obtain reads long enough tomeasure the polymorphic locus, and depending on the lengths of ‘a’ and‘d’ the sequence reads may need to be up to 60 or 75 bp. Usually, thelonger the sequence reads, the higher the cost and time of sequencing agiven number of reads, therefore, minimizing the necessary read lengthcan save both time and money. In addition, since, on average, bases readearlier on the read are read more accurately than those read later onthe read, decreasing the necessary sequence read length can alsoincrease the accuracy of the measurements of the polymorphic region.

In an embodiment, termed internally tagged primers, the primer bindingsite (a) is split in to a plurality of segments (a′, a″, a′″ . . . ),and the sequence tag (b) is on a segment of DNA that is in the middle oftwo of the primer binding sites. This configuration allows the sequencerto make shorter sequence reads. In an embodiment, a′+a″ should be atleast about 18 bp, and can be as long as 30, 40, 50, 60, 80, 100 or morethan 100 bp. In an embodiment, a″ should be at least about 6 bp, and inan embodiment is between about 8 and 16 bp. All other factors beingequal, using the internally tagged primers can cut the length of thesequence reads needed by at least 6 bp, as much as 8 bp, 10 bp, 12 bp,15 bp, and even by as many as 20 or 30 bp. This can result in asignificant money, time and accuracy advantage.

Primers with Ligation Adaptor Binding Region

One issue with fragmented DNA is that since it is short in length, thechance that a polymorphism is close to the end of a DNA strand is higherthan for a long strand. Since PCR capture of a polymorphism requires aprimer binding site of suitable length on both sides of thepolymorphism, a significant number of strands of DNA with the targetedpolymorphism will be missed due to insufficient overlap between theprimer and the targeted binding site. In cases where the binding regionis shorter than the 18 bp typically required for hybridization, theregion (cr) on the primer than is complementary to the library tag isable to increase the binding energy to a point where the PCR canproceed. Note that any specificity that is lost due to a shorter bindingregion can be made up for by other PCR primers with suitably long targetbinding regions. Note that this embodiment can be used in combinationwith direct PCR, or any of the other methods described herein, such asnested PCR, semi nested PCR, hemi nested PCR, one sided nested or semior hemi nested PCR, or other PCR protocols.

When using the sequencing data to determine ploidy in combination withan analytical method that involves comparing the observed allele data tothe expected allele distributions for various hypotheses, eachadditional read from alleles with a low depth of read will yield moreinformation than a read from an allele with a high depth of read.Therefore, ideally, one would wish to see uniform depth of read (DOR)where each locus will have a similar number of representative sequencereads. Therefore, it is desirable to minimize the DOR variance. In anembodiment, it is possible to decrease the coefficient of variance ofthe DOR (this may be defined as the standard deviation of the DOR/theaverage DOR) by increasing the annealing times. In some embodiments theannealing temperatures may be longer than 2 minutes, longer than 4minutes, longer than ten minutes, longer than 30 minutes, and longerthan one hour, or even longer. Since annealing is an equilibriumprocess, there is no limit to the improvement of DOR variance withincreasing annealing times. In an embodiment, increasing the primerconcentration may decrease the DOR variance.

Diagnostic Box

In an embodiment, the present disclosure comprises a diagnostic box thatis capable of partly or completely carrying out any of the methodsdescribed in this disclosure. In an embodiment, the diagnostic box maybe located at a physician's office, a hospital laboratory, or anysuitable location reasonably proximal to the point of patient care. Thebox may be able to run the entire method in a wholly automated fashion,or the box may require one or a number of steps to be completed manuallyby a technician. In an embodiment, the box may be able to analyze atleast the genotypic data measured on the transplant recipient plasma. Inan embodiment, the box may be linked to means to transmit the genotypicdata measured on the diagnostic box to an external computation facilitywhich may then analyze the genotypic data, and possibly also generate areport. The diagnostic box may include a robotic unit that is capable oftransferring aqueous or liquid samples from one container to another. Itmay comprise a number of reagents, both solid and liquid. It maycomprise a high throughput sequencer. It may comprise a computer.

Primer Kit

In some embodiments, a kit may be formulated that comprises a pluralityof primers designed to achieve the methods described in this disclosure.The primers may be outer forward and reverse primers, inner forward andreverse primers as disclosed herein, they could be primers that havebeen designed to have low binding affinity to other primers in the kitas disclosed in the section on primer design, they could be hybridcapture probes or pre-circularized probes as described in the relevantsections, or some combination thereof. In an embodiment, a kit may beformulated for determining the transplant status of a transplantrecipient and designed to be used with the methods disclosed herein, thekit comprising a plurality of inner forward primers and optionally theplurality of inner reverse primers, and optionally outer forward primersand outer reverse primers, where each of the primers is designed tohybridize to the region of DNA immediately upstream and/or downstreamfrom one of the polymorphic sites on the target chromosome, andoptionally additional chromosomes. In an embodiment, the primer kit maybe used in combination with the diagnostic box described elsewhere inthis document.

Compositions of DNA

When performing an informatics analysis on sequencing data measured on amixture of donor and transplant recipient DNA to determine informationpertaining to the transplant, for example the ploidy state of thetransplant, it may be advantageous to measure the allele distributionsat a set of alleles. Unfortunately, in many cases, such as whenattempting to determine the state of a transplant from the DNA mixturefound in the plasma of a transplant recipient blood sample, the amountof DNA available is not sufficient to directly measure the alleledistributions with good fidelity in the mixture. In these cases,amplification of the DNA mixture will provide sufficient numbers of DNAmolecules that the desired allele distributions may be measured withgood fidelity. However, current methods of amplification typically usedin the amplification of DNA for sequencing are often very biased,meaning that they do not amplify both alleles at a polymorphic locus bythe same amount. A biased amplification can result in alleledistributions that are quite different from the allele distributions inthe original mixture. For most purposes, highly accurate measurements ofthe relative amounts of alleles present at polymorphic loci are notneeded. In contrast, in an embodiment of the present disclosure,amplification or enrichment methods that specifically enrich polymorphicalleles and preserve allelic ratios is advantageous.

A number of methods are described herein that may be used topreferentially enrich a sample of DNA at a plurality of loci in a waythat minimizes allelic bias. Some examples are using circularizingprobes to target a plurality of loci where the 3′ ends and 5′ ends ofthe pre-circularized probe are designed to hybridize to bases that areone or a few positions away from the polymorphic sites of the targetedallele. Another is to use PCR probes where the 3′ end PCR probe isdesigned to hybridize to bases that are one or a few positions away fromthe polymorphic sites of the targeted allele. Another is to use a splitand pool approach to create mixtures of DNA where the preferentiallyenriched loci are enriched with low allelic bias without the drawbacksof direct multiplexing. Another is to use a hybrid capture approachwhere the capture probes are designed such that the region of thecapture probe that is designed to hybridize to the DNA flanking thepolymorphic site of the target is separated from the polymorphic site byone or a small number of bases.

In the case where measured allele distributions at a set of polymorphicloci are used to determine the transplant state of a transplantrecipient, it is desirable to preserve the relative amounts of allelesin a sample of DNA as it is prepared for genetic measurements. Thispreparation may involve WGA amplification, targeted amplification,selective enrichment techniques, hybrid capture techniques,circularizing probes or other methods meant to amplify the amount of DNAand/or selectively enhance the presence of molecules of DNA thatcorrespond to certain alleles.

In some embodiments of the present disclosure, there is a set of DNAprobes designed to target loci where the loci have maximal minor allelefrequencies. In some embodiments of the present disclosure, there is aset of probes that are designed to target where the loci have themaximum likelihood of the transplant having a highly informative SNP atthose loci. In some embodiments of the present disclosure, there is aset of probes that are designed to target loci where the probes areoptimized for a given population subgroup. In some embodiments of thepresent disclosure, there is a set of probes that are designed to targetloci where the probes are optimized for a given mix of populationsubgroups. In some embodiments of the present disclosure, there is a setof probes that are designed to target loci where the probes areoptimized for a given pair of parents which are from differentpopulation subgroups that have different minor allele frequencyprofiles. In some embodiments of the present disclosure, there is acircularized strand of DNA that comprises at least one base pair thatannealed to a piece of DNA that is of transplant origin. In someembodiments of the present disclosure, there is a circularized strand ofDNA that circularized while at least some of the nucleotides wereannealed to DNA that was of transplant origin. In some embodiments ofthe present disclosure, there is a set of probes wherein some of theprobes target single tandem repeats, and some of the probes targetsingle nucleotide polymorphisms. In some embodiments, the loci areselected for the purpose of non-invasive diagnosis of transplant status.In some embodiments, the loci are targeted using a method that couldinclude circularizing probes, MIPs, capture by hybridization probes,probes on a SNP array, or combinations thereof. In some embodiments, theprobes are used as circularizing probes, MIPs, capture by hybridizationprobes, probes on a SNP array, or combinations thereof. In someembodiments, the loci are sequenced for the purpose of determination oftransplant status.

In the case where the relative informativeness of a sequence is greaterwhen combined with relevant genotypic contexts, it follows thatmaximizing the number of sequence reads that contain a SNP for which thegenotypic context is known may maximize the informativeness of the setof sequencing reads on the mixed sample. In an embodiment, the number ofsequence reads that contain a SNP for which the genotypic contexts areknown may be enhanced by using qPCR to preferentially amplify specificsequences. In an embodiment, the number of sequence reads that contain aSNP for which the genotypic contexts are known may be enhanced by usingcircularizing probes (for example, MIPs) to preferentially amplifyspecific sequences. In an embodiment, the number of sequence reads thatcontain a SNP for which the genotypic contexts are known may be enhancedby using a capture by hybridization method (for example SURESELECT) topreferentially amplify specific sequences. Different methods may be usedto enhance the number of sequence reads that contain a SNP for which thegenotypic contexts are known. In an embodiment, the targeting may beaccomplished by extension ligation, ligation without extension, captureby hybridization, or PCR.

In a sample of fragmented genomic DNA, a fraction of the DNA sequencesmap uniquely to individual chromosomes; other DNA sequences may be foundon different chromosomes. Note that DNA found in plasma, is typicallyfragmented, often at lengths under 500 bp. In a typical genomic sample,roughly 3.3% of the mappable sequences will map to chromosome 13; 2.2%of the mappable sequences will map to chromosome 18; 1.35% of themappable sequences will map to chromosome 21; 4.5% of the mappablesequences will map to chromosome X in a female; 2.25% of the mappablesequences will map to chromosome X (in a male); and 0.73% of themappable sequences will map to chromosome Y (in a male). Also, amongshort sequences, approximately 1 in 20 sequences will contain a SNP,using the SNPs contained on dbSNP. The proportion may well be highergiven that there may be many SNPs that have not been discovered.

In an embodiment of the present disclosure, targeting methods may beused to enhance the fraction of DNA in a sample of DNA that map to agiven chromosome such that the fraction significantly exceeds thepercentages listed above that are typical for genomic samples. In anembodiment of the present disclosure, targeting methods may be used toenhance the fraction of DNA in a sample of DNA such that the percentageof sequences that contain a SNP are significantly greater than what maybe found in typical for genomic samples. In an embodiment of the presentdisclosure, targeting methods may be used to target DNA from achromosome or from a set of SNPs in a mixture of donor-derived andtransplant recipient-derive DNA for the purposes of determination oftransplant status.

By making use of targeting approaches in sequencing the mixed sample, itmay be possible to achieve a certain level of accuracy with fewersequence reads. The accuracy may refer to sensitivity, it may refer tospecificity, or it may refer to some combination thereof. The desiredlevel of accuracy may be between 90% and 95%; it may be between 95% and98%; it may be between 98% and 99%; it may be between 99% and 99.5%; itmay be between 99.5% and 99.9%; it may be between 99.9% and 99.99%; itmay be between 99.99% and 99.999%, it may be between 99.999% and 100%.Levels of accuracy above 95% may be referred to as high accuracy.

In an embodiment, accuracy may be measured by using linear regression onmeasured donor fractions as a function of the corresponding attemptedspike levels to calculate a linearity, a slope value, and an interceptvalue. The linearity may be represented by the R² valued determined fromthe linear regression analysis. In some embodiments, the linearity isfrom about 0.9 to 1.0; it may be from about 0.95 to 1.0; it may be fromabout 0.98 to 1.0; it may be from about 0.99 to 1.0; it may be fromabout 0.999 to 1.0; it may be 0.999. The slope value may be from 0.5 to5.0, it may be from 0.5 to 2.5; it may be from 0.5 to 2.0; it may 0.5 to1.5; it may from 0.75 to 1.25; it may be from 0.9 to 1.2. The interceptvalue may be from about −0.01 to about 0.1; it may be from about −0.001to about 0.1; it may be from about −0.0001 to about 0.1; it may be fromabout −0.0001 to about 0.01; it may be from about −0.0001 to about0.001; it may be from about −0.0001 to about 0.0001; it may be 0.

In an embodiment, accuracy may refer to precision as determined bycalculating a coefficient of variation (CV) and a confidence interval of95% for the determination of the targeted donor fraction. Estimation ofprecision by calculating a CV may also be referred to as a measurementof reproducibility. The CV value may be represented with a confidenceinterval. The confidence interval for the CV may be 99%; it may be 95%;it may be 90%. The CV may be less than 10%; it may be less than 9%; itmay be less than 8%; it may be less than 7%; it may be less than 6%; itmay be less than 5%; it may be less than 4%; it may be less than 3%; itmay be less than 2%; it may be less than 1%. The CV may be differentdepending on the targeted donor fraction. For a 0.6% targeted donorfraction, the CV may be 1.85% with a confidence interval of 95%. For a2.4% targeted donor fraction, the CV may be 1.22% with a confidenceinterval of 95%. The CV may be different depending on amount of DNA inthe sample. For example, for 15 ng DNA, the CV may be 3.1% with a 95%confidence interval; for 30 ng DNA, the CV may be 3.07% with a 95%confidence interval; for 45 ng DNA, the CV may be 1.99% with a 95%confidence interval.

In an embodiment of the present disclosure, an accurate transplantstatus determination may be made by using targeted sequencing, using anymethod of targeting, for example qPCR, ligand mediated PCR, other PCRmethods, capture by hybridization, or circularizing probes, wherein thenumber of loci along a chromosome that need to be targeted may bebetween 5,000 and 2,000 loci; it may be between 2,000 and 1,000 loci; itmay be between 1,000 and 500 loci; it may be between 500 and 300 loci;it may be between 300 and 200 loci; it may be between 200 and 150 loci;it may be between 150 and 100 loci; it may be between 100 and 50 loci;it may be between 50 and 20 loci; it may be between 20 and 10 loci.Optimally, it may be between 100 and 500 loci. The high level ofaccuracy may be achieved by targeting a small number of loci andexecuting an unexpectedly small number of sequence reads. The number ofreads may be between 100 million and 50 million reads; the number ofreads may be between 50 million and 20 million reads; the number ofreads may be between 20 million and 10 million reads; the number ofreads may be between 10 million and 5 million reads; the number of readsmay be between 5 million and 2 million reads; the number of reads may bebetween 2 million and 1 million; the number of reads may be between 1million and 500,000; the number of reads may be between 500,000 and200,000; the number of reads may be between 200,000 and 100,000; thenumber of reads may be between 100,000 and 50,000; the number of readsmay be between 50,000 and 20,000; the number of reads may be between20,000 and 10,000; the number of reads may be below 10,000. Fewer numberof read are necessary for larger amounts of input DNA.

In some embodiments, a composition is described comprising a mixture ofDNA of donor origin, and DNA of recipient origin, wherein the percent ofsequences that uniquely map to a chromosome, and that contains at leastone single nucleotide polymorphism is greater than 0.2%, greater than0.3%, greater than 0.4%, greater than 0.5%, greater than 0.6%, greaterthan 0.7%, greater than 0.8%, greater than 0.9%, greater than 1%,greater than 1.2%, greater than 1.4%, greater than 1.6%, greater than1.8%, greater than 2%, greater than 2.5%, greater than 3%, greater than4%, greater than 5%, greater than 6%, greater than 7%, greater than 8%,greater than 9%, greater than 10%, greater than 12%, greater than 15%,or greater than 20%, and where the chromosome is taken from the group13, 18, 21, X, or Y. In some embodiments of the present disclosure,there is a composition comprising a mixture of DNA of donor origin, andDNA of recipient origin, wherein the percent of sequences that uniquelymap to a chromosome and that contain at least one single nucleotidepolymorphism from a set of single nucleotide polymorphisms is greaterthan 0.15%, greater than 0.2%, greater than 0.3%, greater than 0.4%,greater than 0.5%, greater than 0.6%, greater than 0.7%, greater than0.8%, greater than 0.9%, greater than 1%, greater than 1.2%, greaterthan 1.4%, greater than 1.6%, greater than 1.8%, greater than 2%,greater than 2.5%, greater than 3%, greater than 4%, greater than 5%,greater than 6%, greater than 7%, greater than 8%, greater than 9%,greater than 10%, greater than 12%, greater than 15%, or greater than20%, where the chromosome is taken from the set of chromosome 13, 18,21, X and Y, and where the number of single nucleotide polymorphisms inthe set of single nucleotide polymorphisms is between 1 and 10, between10 and 20, between 20 and 50, between 50 and 100, between 100 and 200,between 200 and 500, between 500 and 1,000, between 1,000 and 2,000,between 2,000 and 5,000, between 5,000 and 10,000, between 10,000 and20,000, between 20,000 and 50,000, and between 50,000 and 100,000.

In theory, each cycle in the amplification doubles the amount of DNApresent; however, in reality, the degree of amplification is slightlylower than two. In theory, amplification, including targetedamplification, will result in bias free amplification of a DNA mixture;in reality, however, different alleles tend to be amplified to adifferent extent than other alleles. When DNA is amplified, the degreeof allelic bias typically increases with the number of amplificationsteps. In some embodiments, the methods described herein involveamplifying DNA with a low level of allelic bias. Since the allelic biascompounds with each additional cycle, one can determine the per cycleallelic bias by calculating the nth root of the overall bias where n isthe base 2 logarithm of degree of enrichment. In some embodiments, thereis a composition comprising a second mixture of DNA, where the secondmixture of DNA has been preferentially enriched at a plurality ofpolymorphic loci from a first mixture of DNA where the degree ofenrichment is at least 10, at least 100, at least 1,000, at least10,000, at least 100,000 or at least 1,000,000, and where the ratio ofthe alleles in the second mixture of DNA at each locus differs from theratio of the alleles at that locus in the first mixture of DNA by afactor that is, on average, less than 1,000%, 500%, 200%, 100%, 50%,20%, 10%, 5%, 2%, 1%, 0.5%, 0.2%, 0.1%, 0.05%, 0.02%, or 0.01%. In someembodiments, there is a composition comprising a second mixture of DNA,where the second mixture of DNA has been preferentially enriched at aplurality of polymorphic loci from a first mixture of DNA where the percycle allelic bias for the plurality of polymorphic loci is, on average,less than 10%, 5%, 2%, 1%, 0.5%, 0.2%, 0.1%, 0.05%, or 0.02%. In someembodiments, the plurality of polymorphic loci comprises at least 10loci, at least 20 loci, at least 50 loci, at least 100 loci, at least200 loci, at least 500 loci, at least 1,000 loci, at least 2,000 loci,at least 5,000 loci, at least 10,000 loci, at least 20,000 loci, or atleast 50,000 loci.

Maximum Likelihood Estimates

Most methods known in the art for detecting the presence or absence ofbiological phenomenon or medical condition involve the use of a singlehypothesis rejection test, where a metric that is correlated with thecondition is measured, and if the metric is on one side of a giventhreshold, the condition is present, while of the metric falls on theother side of the threshold, the condition is absent. Asingle-hypothesis rejection test only looks at the null distributionwhen deciding between the null and alternate hypotheses. Without takinginto account the alternate distribution, one cannot estimate thelikelihood of each hypothesis given the observed data and thereforecannot calculate a confidence on the call. Hence with asingle-hypothesis rejection test, one gets a yes or no answer without afeeling for the confidence associated with the specific case.

In some embodiments, the method disclosed herein is able to detect thepresence or absence of biological phenomenon or medical condition usinga maximum likelihood method. This is a substantial improvement over amethod using a single hypothesis rejection technique as the thresholdfor calling absence or presence of the condition can be adjusted asappropriate for each case.

The maximum likelihood estimation method uses the distributionsassociated with each hypothesis to estimate the likelihood of the dataconditioned on each hypothesis. These conditional probabilities can thenbe converted to a hypothesis call and confidence. Similarly, maximum aposteriori estimation method uses the same conditional probabilities asthe maximum likelihood estimate, but also incorporates population priorswhen choosing the best hypothesis and determining confidence.

Therefore, the use of a maximum likelihood estimate (MLE) technique, orthe closely related maximum a posteriori (MAP) technique give twoadvantages, first it increases the chance of a correct call, and it alsoallows a confidence to be calculated for each call. In an embodiment,selecting the ploidy state corresponding to the hypothesis with thegreatest probability is carried out using maximum likelihood estimatesor maximum a posteriori estimates. In an embodiment, a method isdisclosed for determining the transplant status in a transplantrecipient that involves taking any method currently known in the artthat uses a single hypothesis rejection technique and reformulating itsuch that it uses a MLE or MAP technique. Some examples of methods thatcan be significantly improved by applying these techniques can be foundin U.S. Pat. Nos. 8,008,018, 7,888,017, or U.S. Pat. No. 7,332,277.

In an embodiment, a method is described for determining presence orabsence of fetal aneuploidy in a transplant recipient plasma samplecomprising fetal and maternal genomic DNA, the method comprising:obtaining a transplant recipient plasma sample; measuring the DNAfragments found in the plasma sample with a high throughput sequencer;calculating the fraction of donor-derived DNA in the plasma sample; andusing a MLE or MAP determine which of the distributions is most likelyto be correct, thereby indicating the presence or absence of atransplant undergoing acute rejection, borderline rejection, otherinjury or stability. In an embodiment, the measuring the DNA from theplasma may involve conducting massively parallel shotgun sequencing. Inan embodiment, the measuring the DNA from the plasma sample may involvesequencing DNA that has been preferentially enriched, for examplethrough targeted amplification, at a plurality of polymorphic ornon-polymorphic loci. The purpose of the preferential enrichment is toincrease the number of sequence reads that are informative for thetransplant status determination.

Transplant Status Calling Informatics Methods

Described herein is a method for determining the state of a transplantgiven sequence data. In some embodiments, this sequence data may bemeasured on a high throughput sequencer. In some embodiments, thesequence data may be measured on DNA that originated from free floatingDNA isolated from recipient blood, wherein the free floating DNAcomprises some DNA of transplant recipient origin, and some DNA oftransplant donor origin. This section will describe one embodiment ofthe present disclosure in which the state of the transplant isdetermined assuming that fraction of donor-derived DNA in the mixturethat has been analyzed is not known and will be estimated from the data.It will also describe an embodiment in which the fraction ofdonor-derived DNA (“donor fraction”) or the percentage of donor-derivedDNA in the mixture can be measured by another method. In someembodiments the donor fraction can be calculated using only thegenotyping measurements made on the blood sample itself, which is amixture of donor and transplant recipient DNA. In some embodiments thefraction may be calculated also using the measured or otherwise knowngenotype of the transplant recipient and/or the measured or otherwiseknown genotype of the transplant donor. In another embodiment, the stateof the transplant can be determined solely based on the calculatedfraction of donor-derived DNA.

Informatics methods useful and relevant to the methods disclosed hereincan be found in U.S. Patent Publication No. 20180025109, incorporated byreference herein, wherein the informatics methods are disclosed in thecontext of determination of genetic state of a fetus via non-invasiveprenatal testing.

For example, in an embodiment, the informatics method may incorporaterandom bias. As is often the case, suppose that there is a bias in themeasurements, so that the probability of getting an A on this SNP isequal to q, which is a bit different than p as defined above. How muchdifferent p is from q depends on the accuracy of the measurement processand number of other factors and can be quantified by standard deviationsof q away from p. In an embodiment, it is possible to model q as havinga beta distribution, with parameters α, β depending on the mean of thatdistribution being centered at p, and some specified standard deviations. In particular, this gives X|q˜Bin(q, D_(i)), where q˜Beta(α,β). If welet E(q)=p, V(q)=s², and parameters α, β can be derived as α=pN,β=(1−p)N, where

$N = {\frac{p\left( {1 - p} \right)}{s^{2}} - {1.}}$

In some embodiments, the method may be written to specifically take intoaccount additional noise, differential sample quality, differential SNPquality, and random sampling bias. In some embodiments, the methodinvolves several steps that each introduce different kind of noiseand/or bias to the final model:

(1) Suppose the first sample that comprises a mixture of maternal andfetal DNA contains an original amount of DNA of size=No molecules,usually in the range 1,000-40,000, where p=true % refs

(2) In the amplification using the universal ligation adaptors, assumethat N₁ molecules are sampled; usually N₁˜N₀/2 molecules and randomsampling bias is introduced due to sampling.

The amplified sample may contain a number of molecules N₂ where N₂>>N₁.Let X₁ represent the amount of reference loci (on per SNP basis) out ofN₁ sampled molecules, with a variation in p₁=X₁/N₁ that introducesrandom sampling bias throughout the rest of protocol. This sampling biasis included in the model by using a Beta-Binomial (BB) distributioninstead of using a simple Binomial distribution model. Parameter N ofthe Beta-Binomial distribution may be estimated later on per samplebasis from training data after adjusting for leakage and amplificationbias, on SNPs with 0<p<1. Leakage is the tendency for a SNP to be readincorrectly.

(3) The amplification step will amplify any allelic bias, thusamplification bias introduced due to possible uneven amplification.Suppose that one allele at a locus is amplified f times another alleleat that locus is amplified g times, where f=ge^(b), where b=0 indicatesno bias. The bias parameter, b, is centered at 0, and indicates how muchmore or less the A allele get amplified as opposed to the B allele on aparticular SNP. The parameter b may differ from SNP to SNP. Biasparameter b may be estimated on per SNP basis, for example from trainingdata.

(4) The sequencing step involves sequencing a sample of amplifiedmolecules. In this step there may be leakage, where leakage is thesituation where a SNP is read incorrectly. Leakage may result from anynumber of problems, and may result in a SNP being read not as thecorrect allele A, but as another allele B found at that locus or as anallele C or D not typically found at that locus. Suppose the sequencingmeasures the sequence data of a number of DNA molecules from anamplified sample of size N₃, where N₃<N₂. In some embodiments, N₃ may bein the range of 20,000 to 100,000; 100,000 to 500,000; 500,000 to4,000,000; 4,000,000 to 20,000,000; or 20,000,000 to 100,000,000. Eachmolecule sampled has a probability p_(g) of being read correctly, inwhich case it will show up correctly as allele A. The sample will beincorrectly read as an allele unrelated to the original molecule withprobability 1−p_(g), and will look like allele A with probability p_(r),allele B with probabililty p_(m) or allele C or allele D withprobability p_(o), where p_(r)+p_(m)+p_(o)=1. Parameters p_(g), p_(r),p_(m), p_(o) are estimated on per SNP basis from the training data.

Different protocols may involve similar steps with variations in themolecular biology steps resulting in different amounts of randomsampling, different levels of amplification and different leakage bias.The following model may be equally well applied to each of these cases.The model for the amount of DNA sampled, on per SNP basis, is given by:

X ₃˜BetaBinomial(L(F(p,b),p _(r) ,p _(g)),N*H(p,b))

where p=the true amount of reference DNA, b=per SNP bias, and asdescribed above, p_(g) is the probability of a correct read, p_(r) isthe probability of read being read incorrectly but serendipitouslylooking like the correct allele, in case of a bad read, as describedabove, and:

F(p,b)=pe ^(b)/(pe ^(b)+(1−p)),H(p,b)=(e ^(b) p+(1−p))² /e ^(b) ,L(p,p_(r) ,p _(g))=p*p _(g) +p _(r)*(1−p _(g)).

In some embodiments, the method uses a Beta-Binomial distributioninstead of a simple binomial distribution; this takes care of the randomsampling bias. Parameter N of the Beta-Binomial distribution isestimated on per sample basis on an as needed basis. Using biascorrection F(p,b), H(p,b), instead of just p, takes care of theamplification bias. Parameter b of the bias is estimated on per SNPbasis from training data ahead of time.

In some embodiments the method uses leakage correction L(p,p_(r),p_(g)),instead of just p; this takes care of the leakage bias, i.e. varying SNPand sample quality. In some embodiments, parameters p_(g), p_(r), p_(o)are estimated on per SNP basis from the training data ahead of time. Insome embodiments, the parameters p_(g), p_(r), p_(o) may be updated withthe current sample on the go, to account for varying sample quality.

The model described herein is quite general and can account for bothdifferential sample quality and differential SNP quality. Differentsamples and SNPs are treated differently, as exemplified by the factthat some embodiments use Beta-Binomial distributions whose mean andvariance are a function of the original amount of DNA, as well as sampleand SNP quality.

Platform modeling

An observation at a SNP consists of the number of mapped reads with eachallele present, n_(a) and n_(b), which sum to the depth of read d.Assume that thresholds have already been applied to the mappingprobabilities and phred scores such that the mappings and alleleobservations can be considered correct. A phred score is a numericalmeasure that relates to the probability that a particular measurement ata particular base is wrong. In an embodiment, where the base has beenmeasured by sequencing, the phred score may be calculated from the ratioof the dye intensity corresponding to the called base to the dyeintensity of the other bases. The simplest model for the observationlikelihood is a binomial distribution which assumes that each of the dreads is drawn independently from a large pool that has allele ratio r.Equation 2 describes this model.

$\begin{matrix}{{P\left( {n_{a},\left. n_{b} \middle| r \right.} \right)} = {{p_{bino}\left( {{n_{a};{n_{a} + n_{b}}},r} \right)} = {\begin{pmatrix}{n_{a} + n_{b}} \\n_{a}\end{pmatrix}{r^{n_{a}}\left( {1 - r} \right)}^{n_{b}}}}} & (2)\end{matrix}$

The binomial model can be extended in a number of ways. When the donorand recipient genotypes are either all A or all B, the expected alleleratio in plasma will be 0 or 1, and the binomial probability will not bewell-defined. In practice, unexpected alleles are sometimes observed inpractice. In an embodiment, it is possible to use a corrected alleleratio {circumflex over (r)}=11(n_(a)+n_(b)) to allow a small number ofthe unexpected allele. In an embodiment, it is possible to use trainingdata to model the rate of the unexpected allele appearing on each SNP,and use this model to correct the expected allele ratio. When theexpected allele ratio is not 0 or 1, the observed allele ratio may notconverge with a sufficiently high depth of read to the expected alleleratio due to amplification bias or other phenomena. The allele ratio canthen be modeled as a beta distribution centered at the expected alleleratio, leading to a beta-binomial distribution for P(n_(a), n_(b)|r)which has higher variance than the binomial.

The platform model for the response at a single SNP will be defined asF(a, b, g_(c), g_(m), f) (3), or the probability of observing n_(a)=aand n_(b)=b given the maternal and fetal genotypes, which also dependson the fetal fraction through equation 1. The functional form of F maybe a binomial distribution, beta-binomial distribution, or similarfunctions as discussed above.

F(a,b,g _(c) ,g _(m) ,f)=P(n _(a) =a,n _(b) =b|g _(c) ,g _(m) ,f)=P(n_(a) =a,n _(b) =b|r(g _(c) ,g _(m) ,f)  (3)

In an embodiment, a method of the present disclosure is used todetermine the transplant status of the plant recipient involves takinginto account the fraction of donor DNA in the sample. In anotherembodiment of the present disclosure, the method involves the use ofmaximum likelihood estimations. In an embodiment, a method of thepresent disclosure involves calculating the percent of DNA in a samplethat is donor-derived. In an embodiment, the threshold for calling acuterejection of a transplant is adaptively adjusted based on the calculatedpercent donor-derived DNA.

In an embodiment of the present disclosure, the fraction ofdonor-derived DNA, or the percentage of donor DNA in the mixture can bemeasured. In some embodiments the fraction can be calculated using onlythe genotyping measurements made on the transplant recipient plasmasample itself, which is a mixture of donor-derived and transplantrecipient DNA. In some embodiments the fraction may be calculated alsousing the measured or otherwise known genotype of the transplantrecipient and/or the measured or otherwise known genotype of thetransplant donor. In some embodiments the percent donor DNA may becalculated using the measurements made on the mixture of donor-derivedand transplant recipient DNA along with the knowledge of the genotypiccontexts. In an embodiment, the fraction of donor DNA may be calculatedusing population frequencies to adjust the model on the probability onparticular allele measurements.

In an embodiment of the present disclosure, a confidence may becalculated on the accuracy of the determination of transplant status. Inan embodiment, the confidence of the hypothesis of greatest likelihood(H_(major)) may be calculated as (1−H_(major))/Σ(all H). It is possibleto determine the confidence of a hypothesis if the distributions of allof the hypotheses are known. It is possible to determine thedistribution of all of the hypotheses if the donor and recipientgenotype information is known. In an embodiment one may use theknowledge of the distribution of a test statistic around a normalhypothesis and around an abnormal hypothesis to determine both thereliability of the call as well as refine the threshold to make a morereliable call. This is particularly useful when the amount and/orpercent of donor DNA in the mixture is low.

Further Discussion of the Method

In an embodiment, a method disclosed herein utilizes a quantitativemeasure of the number of independent observations of each allele at apolymorphic locus, where this does not involve calculating the ratio ofthe alleles. This is different from methods, such as some microarraybased methods, which provide information about the ratio of two allelesat a locus but do not quantify the number of independent observations ofeither allele. Some methods known in the art can provide quantitativeinformation regarding the number of independent observations, but thecalculations leading to the ploidy determination utilize only the alleleratios, and do not utilize the quantitative information. To illustratethe importance of retaining information about the number of independentobservations consider the sample locus with two alleles, A and B. In afirst experiment twenty A alleles and twenty B alleles are observed, ina second experiment 200 A alleles and 200 B alleles are observed. Inboth experiments the ratio (A/(A+B)) is equal to 0.5, however the secondexperiment conveys more information than the first about the certaintyof the frequency of the A or B allele. The instant method, rather thanutilizing the allele ratios, uses the quantitative data to moreaccurately model the most likely allele frequencies at each polymorphiclocus.

In an embodiment, a reference chromosome is used to determine the donorfraction and noise level amount or probability distribution. The instantmethod works without the reference chromosome, as well as without fixingthe particular donor fraction or noise level.

Measurements of DNA are noisy and/or error prone, especiallymeasurements where the amount of DNA is small, or where the DNA is mixedwith contaminating DNA. This noise results in less accurate genotypicdata, and less accurate transplant status determination. In someembodiments, platform modeling or some other method of noise modelingmay be used to counter the deleterious effects of noise on thetransplant status determination. The instant method uses a joint modelof both channels, which accounts for the random noise due to the amountof input DNA, DNA quality, and/or protocol quality.

In particular, errors in the measurements typically do not specificallydepend on the measured channel intensity ratio, which reduces the modelto using one-dimensional information. Accurate modeling of noise,channel quality and channel interaction requires a two-dimensional jointmodel, which can not be modeled using allele ratios.

In particular, projecting two channel information to the ratio r wheref(x,y) is r=x/y, does not lend itself to accurate channel noise and biasmodeling. Noise on a particular SNP is not a function of the ratio, i.e.noise(x,y) f(x,y) but is in fact a joint function of both channels. Forexample, in the binomial model, noise of the measured ratio has avariance of r(1−r)/(x+y) which is not a function purely of r. In such amodel, where any channel bias or noise is included, suppose that on SNPi, the observed channel X value is x=a_(i)X+b_(i), where X is the truechannel value, b, is the extra channel bias and random noise. Similarly,suppose that y=c_(i)Y+d_(i). The observed ratio r=x/y cannot accuratelypredict the true ratio X/Y or model the leftover noise, since(aiX+bi)/(ciY+di) is not a function of X/Y.

The method disclosed herein describes an effective way to model noiseand bias using joint binomial distributions of all of the measurementchannels individually. Relevant equations may be found elsewhere in thedocument in sections which speaks of per SNP consistent bias, P(good)and P(ref|bad), P(mut|bad) which effectively adjust SNP behavior. In anembodiment, a method of the present disclosure uses a BetaBinomialdistribution, which avoids the limiting practice of relying on theallele ratios only, but instead models the behavior based on bothchannel counts.

In an embodiment, a method disclosed herein can call the transplantstatus of a transplant recipient from genetic data found in transplantrecipient plasma by using all available measurements. Some methods knownin the art only use measured genetic data where the genotypic context isfrom the AA|BB context, that is, where the donor and recipient are bothhomozygous at a given locus, but for a different allele. One problemwith this method is that a small proportion of polymorphic loci are fromthe AA|BB context, typically less than 10%. In an embodiment of a methoddisclosed herein, the method does not use genetic measurements of thetransplant recipient plasma made at loci where the genotypic context isAA|BB. In an embodiment, the instant method uses plasma measurements foronly those polymorphic loci with the AA|AB, AB|AA, and AB|AB genotypiccontext.

Variable Read Depth to Minimize Sequencing Cost

In many clinical trials concerning a diagnostic, for example, in Chiu etal. BMJ 2011; 342:c7401, a protocol with a number of parameters is set,and then the same protocol is executed with the same parameters for eachof the patients in the trial. In the case of determining the transplantstatus in a transplant recipient using sequencing as a method to measuregenetic material one pertinent parameter is the number of reads. Thenumber of reads may refer to the number of actual reads, the number ofintended reads, fractional lanes, full lanes, or full flow cells on asequencer. In these studies, the number of reads is typically set at alevel that will ensure that all or nearly all of the samples achieve thedesired level of accuracy. Sequencing is currently an expensivetechnology, a cost of roughly $200 per 5 mappable million reads, andwhile the price is dropping, any method which allows a sequencing baseddiagnostic to operate at a similar level of accuracy but with fewerreads will necessarily save a considerable amount of money.

The accuracy of a transplant status determination is typically dependenton a number of factors, including the number of reads and the fractionof donor-derived DNA in the mixture. The accuracy is typically higherwhen the fraction of donor-derived DNA in the mixture is higher. At thesame time, the accuracy is typically higher if the number of reads isgreater. It is possible to have a situation with two cases where thetransplant state is determined with comparable accuracies wherein thefirst case has a lower fraction of donor-derived DNA in the mixture thanthe second, and more reads were sequenced in the first case than thesecond. It is possible to use the estimated fraction of donor DNA in themixture as a guide in determining the number of reads necessary toachieve a given level of accuracy.

In an embodiment of the present disclosure, a set of samples can be runwhere different samples in the set are sequenced to different readsdepths, wherein the number of reads run on each of the samples is chosento achieve a given level of accuracy given the calculated fraction ofdonor DNA in each mixture. In an embodiment of the present disclosure,this may entail making a measurement of the mixed sample to determinethe fraction of donor DNA in the mixture; this estimation of the donorfraction may be done with sequencing, it may be done with TAQMAN, it maybe done with qPCR, it may be done with SNP arrays, it may be done withany method that can distinguish different alleles at a given loci. Theneed for a donor fraction estimate may be eliminated by includinghypotheses that cover all or a selected set of donor fractions in theset of hypotheses that are considered when comparing to the actualmeasured data. After the fraction of donor DNA in the mixture has beendetermined, the number of sequences to be read for each sample may bedetermined.

Using Raw Genotyping Data

There are a number of methods that can accomplish the methods disclosedherein using donor genetic information measured on donor-derived DNAfound in transplant recipient blood. Some of these methods involvemaking measurements of the fetal DNA using SNP arrays, some methodsinvolve untargeted sequencing, and some methods involve targetedsequencing. The targeted sequencing may target SNPs, it may target STRs,it may target other polymorphic loci, it may target non-polymorphicloci, or some combination thereof. Some of these methods may involveusing a commercial or proprietary allele caller that calls the identityof the alleles from the intensity data that comes from the sensors inthe machine doing the measuring. For example, the ILLUMINA INFINIUMsystem or the AFFYMETRIX GENECHIP microarray system involves beads ormicrochips with attached DNA sequences that can hybridize tocomplementary segments of DNA; upon hybridization, there is a change inthe fluorescent properties of the sensor molecule that can be detected.There are also sequencing methods, for example the ILLUMINA SOLEXAGENOME SEQUENCER or the ABI SOLID GENOME SEQUENCER, wherein the geneticsequence of fragments of DNA are sequenced; upon extension of the strandof DNA complementary to the strand being sequenced, the identity of theextended nucleotide is typically detected via a fluorescent or radio tagappended to the complementary nucleotide. In all of these methods thegenotypic or sequencing data is typically determined on the basis offluorescent or other signals, or the lack thereof. These systems aretypically combined with low level software packages that make specificallele calls (secondary genetic data) from the analog output of thefluorescent or other detection device (primary genetic data). Forexample, in the case of a given allele on a SNP array, the software willmake a call, for example, that a certain SNP is present or not presentif the fluorescent intensity is measure above or below a certainthreshold. Similarly, the output of a sequencer is a chromatogram thatindicates the level of fluorescence detected for each of the dyes, andthe software will make a call that a certain base pair is A or T or C orG. High throughput sequencers typically make a series of suchmeasurements, called a read, that represents the most likely structureof the DNA sequence that was sequenced. The direct analog output of thechromatogram is defined here to be the primary genetic data, and thebase pair/SNP calls made by the software are considered here to be thesecondary genetic data. In an embodiment, primary data refers to the rawintensity data that is the unprocessed output of a genotyping platform,where the genotyping platform may refer to a SNP array, or to asequencing platform. The secondary genetic data refers to the processedgenetic data, where an allele call has been made, or the sequence datahas been assigned base pairs, and/or the sequence reads have been mappedto the genome.

Many higher level applications take advantage of these allele calls, SNPcalls and sequence reads, that is, the secondary genetic data, that thegenotyping software produces. For example, DNA NEXUS, ELAND or MAQ willtake the sequencing reads and map them to the genome. In the context ofnon-invasive determination of transplant status it may be possible totake a set of sequence reads that have been measured on DNA present intransplant recipient plasma, and map them to the genome. One may thentake a normalized count of the reads that are mapped to each chromosome,or section of a chromosome, and use that data to determine thetransplant state of a transplant recipient.

However, in reality, the initial output of the measuring instruments isan analog signal. When a certain base pair is called by the softwarethat is associated with the sequencing software, for example thesoftware may call the base pair a T, in reality the call is the callthat the software believes to be most likely. In some cases, however,the call may be of low confidence, for example, the analog signal mayindicate that the particular base pair is only 90% likely to be a T, and10% likely to be an A. In another example, the genotype calling softwarethat is associated with a SNP array reader may call a certain allele tobe G. However, in reality, the underlying analog signal may indicatethat it is only 70% likely that the allele is G, and 30% likely that theallele is T. In these cases, when the higher level applications use thegenotype calls and sequence calls made by the lower level software, theyare losing some information. That is, the primary genetic data, asmeasured directly by the genotyping platform, may be messier than thesecondary genetic data that is determined by the attached softwarepackages, but it contains more information. In mapping the secondarygenetic data sequences to the genome, many reads are thrown out becausesome bases are not read with enough clarity and or mapping is not clear.When the primary genetic data sequence reads are used, all or many ofthose reads that may have been thrown out when first converted tosecondary genetic data sequence read can be used by treating the readsin a probabilistic manner.

In an embodiment of the present disclosure, the higher level softwaredoes not rely on the allele calls, SNP calls, or sequence reads that aredetermined by the lower level software. Instead, the higher levelsoftware bases its calculations on the analog signals directly measuredfrom the genotyping platform. In an embodiment of the presentdisclosure, all genetic calls, SNPs calls, sequence reads, sequencemapping is treated in a probabilistic manner by using the raw intensitydata as measured directly by the genotyping platform, rather thanconverting the primary genetic data to secondary genetic calls. In anembodiment, the DNA measurements from the prepared sample used incalculating allele count probabilities and determining the relativeprobability of each hypothesis comprise primary genetic data.

In some embodiments, the method can increase the accuracy of geneticdata of a target individual which incorporates genetic data of at leastone related individual, the method comprising obtaining primary geneticdata specific to a target individual's genome and genetic data specificto the genome(s) of the related individual(s), creating a set of one ormore hypotheses concerning possibly which segments of which chromosomesfrom the related individual(s) correspond to those segments in thetarget individual's genome, determining the probability of each of thehypotheses given the target individual's primary genetic data and therelated individual(s)'s genetic data, and using the probabilitiesassociated with each hypothesis to determine the most likely state ofthe actual genetic material of the target individual. In an embodiment,a method of the present disclosure can determine an allelic state in aset of alleles, in a target individual, and from one or both parents ofthe target individual, and optionally from one or more relatedindividuals, the method comprising obtaining primary genetic data fromthe target individual, and from the one or both parents, and from anyrelated individuals, creating a set of at least one allelic hypothesisfor the target individual, and for the one or both parents, andoptionally for the one or more related individuals, where the hypothesesdescribe possible allelic states in the set of alleles, determining astatistical probability for each allelic hypothesis in the set ofhypotheses given the obtained genetic data, and determining the allelicstate for each of the alleles in the set of alleles for the targetindividual, and for the one or both parents, and optionally for the oneor more related individuals, based on the statistical probabilities ofeach of the allelic hypotheses.

In some embodiments, the genetic data of the mixed sample may comprisesequence data wherein the sequence data may not uniquely map to thehuman genome. In some embodiments, the genetic data of the mixed samplemay comprise sequence data wherein the sequence data maps to a pluralityof locations in the genome, wherein each possible mapping is associatedwith a probability that the given mapping is correct. In someembodiments, the sequence reads are not assumed to be associated with aparticular position in the genome. In some embodiments, the sequencereads are associated with a plurality of positions in the genome, and anassociated probability belonging to that position.

Combining Methods of Transplant Status Determination

Disclosed herein is a method for making more accurate predictions aboutthe genetic state of a transplant, that comprises combining predictionsof transplant state with other known methods to make such adetermination. For example, serum creatinine levels have previously beenused to try to determine the status of a kidney transplant. See FIG. 7.

There are many ways to combine the predictions, for example, one couldconvert the hormone measurements into a multiple of the median (MoM) andthen into likelihood ratios (LR). Similarly, other measurements could betransformed into LRs using the mixture model of NT distributions.Detection rates (DRs) and false-positive rates (FPRs) could becalculated by taking the proportions with risks above a given riskthreshold.

In an embodiment, it is possible to evoke central limit theorem toassume distribution on g(y|a or e) is Gaussian, and measure mean andstandard deviation by looking at multiple samples. In anotherembodiment, one could assume they are not independent given the outcomeand collect enough samples to estimate the joint distribution p(x₁, x₂,x₃, x₄|a or e).

In an embodiment, the transplant status is determined to be thetransplant status that is associated with the hypothesis whoseprobability is the greatest. In some cases, one hypothesis will have anormalized, combined probability greater than 90%. Each hypothesis isassociated with one, or a set of, transplant statuses, and thetransplant associated with the hypothesis whose normalized, combinedprobability is greater than 90%, or some other threshold value, such as50%, 80%, 95%, 98%, 99%, or 99.9%, may be chosen as the thresholdrequired for a hypothesis to be called as the determined transplantstatus.

Determining the Number of DNA Molecules in a Sample.

A method is described herein to determine the number of DNA molecules ina sample by generating a uniquely identified molecule for each originalDNA molecules in the sample during the first round of DNA amplification.Described here is a procedure to accomplish the above end followed by asingle molecule or clonal sequencing method.

The approach entails targeting one or more specific loci and generatinga tagged copy of the original molecules such manner that most or all ofthe tagged molecules from each targeted locus will have a unique tag andcan be distinguished from one another upon sequencing of this barcodeusing clonal or single molecule sequencing. Each unique sequencedbarcode represents a unique molecule in the original sample.Simultaneously, sequencing data is used to ascertain the locus fromwhich the molecule originates. Using this information one can determinethe number of unique molecules in the original sample for each locus.

This method can be used for any application in which quantitativeevaluation of the number of molecules in an original sample is required.Furthermore, the number of unique molecules of one or more targets canbe related to the number of unique molecules to one or more othertargets to determine the relative copy number, allele distribution, orallele ratio. Alternatively, the number of copies detected from varioustargets can be modeled by a distribution in order to identify the mostlylikely number of copies of the original targets. Applications includebut are not limited to detection of insertions and deletions such asthose found in carriers of Duchenne Muscular Dystrophy; quantitation ofdeletions or duplications segments of chromosomes such as those observedin copy number variants; chromosome copy number of samples from bornindividuals; chromosome copy number of samples from unborn individualssuch as embryos or fetuses.

The method can be combined with simultaneous evaluation of variationscontained in the targeted by sequence. This can be used to determine thenumber of molecules representing each allele in the original sample.

In an embodiment, the method as it pertains to a single target locus maycomprise one or more of the following steps: (1) Designing a standardpair of oligomers for PCR amplification of a specific locus. (2) Adding,during synthesis, a sequence of specified bases with no or minimalcomplementarity to the target locus or genome to the 5′ end of the oneof the target specific oligomer. This sequence, termed the tail, is aknown sequence, to be used for subsequent amplification, followed by asequence of random nucleotides. These random nucleotides comprise therandom region. The random region comprises a randomly generated sequenceof nucleic acids that probabilistically differ between each probemolecule. Consequently, following synthesis, the tailed oligomer poolwill consists of a collection of oligomers beginning with a knownsequence followed by unknown sequence that differs between molecules,followed by the target specific sequence. (3) Performing one round ofamplification (denaturation, annealing, extension) using only the tailedoligomer. (4) adding exonuclease to the reaction, effectively stoppingthe PCR reaction, and incubating the reaction at the appropriatetemperature to remove forward single stranded oligos that did not annealto temple and extend to form a double stranded product. (5) Incubatingthe reaction at a high temperature to denature the exonuclease andeliminate its activity. (6) Adding to the reaction a new oligonucleotidethat is complementary to tail of the oligomer used in the first reactionalong with the other target specific oligomer to enable PCRamplification of the product generated in the first round of PCR. (7)Continuing amplification to generate enough product for downstreamclonal sequencing. (8) Measuring the amplified PCR product by amultitude of methods, for example, clonal sequencing, to a sufficientnumber of bases to span the sequence.

In an embodiment, a method of the present disclosure involves targetingmultiple loci in parallel or otherwise. Primers to different target locican be generated independently and mixed to create multiplex PCR pools.In an embodiment, original samples can be divided into sub-pools anddifferent loci can be targeted in each sub-pool before being recombinedand sequenced. In an embodiment, the tagging step and a number ofamplification cycles may be performed before the pool is subdivided toensure efficient targeting of all targets before splitting, andimproving subsequent amplification by continuing amplification usingsmaller sets of primers in subdivided pools.

In some circumstances, especially in cases where there is a very smallamount of DNA, for example, fewer than 5,000 copies of the genome, fewerthan 1,000 copies of the genome, fewer than 500 copies of the genome,and fewer than 100 copies of the genome, one can encounter a phenomenoncalled bottlenecking. This is where there are a small number of copiesof any given allele in the initial sample, and amplification biases canresult in the amplified pool of DNA having significantly differentratios of those alleles than are in the initial mixture of DNA. Byapplying a unique or nearly unique set of barcodes to each strand of DNAbefore standard PCR amplification, it is possible to exclude n−1 copiesof DNA from a set of n identical molecules of sequenced DNA thatoriginated from the same original molecule.

For example, imagine a heterozygous SNP in the genome of an individual,and a mixture of DNA from the individual where ten molecules of eachallele are present in the original sample of DNA. After amplificationthere may be 100,000 molecules of DNA corresponding to that locus. Dueto stochastic processes, the ratio of DNA could be anywhere from 1:2 to2:1, however, since each of the original molecules was tagged with aunique tag, it would be possible to determine that the DNA in theamplified pool originated from exactly 10 molecules of DNA from eachallele. This method would therefore give a more accurate measure of therelative amounts of each allele than a method not using this approach.For methods where it is desirable for the relative amount of allele biasto be minimized, this method will provide more accurate data.

Association of the sequenced fragment to the target locus can beachieved in a number of ways. In an embodiment, a sequence of sufficientlength is obtained from the targeted fragment to span the moleculebarcode as well a sufficient number of unique bases corresponding to thetarget sequence to allow unambiguous identification of the target locus.In another embodiment, the molecular bar-coding primer that contains therandomly generated molecular barcode can also contain a locus specificbarcode (locus barcode) that identifies the target to which it is to beassociated. This locus barcode would be identical among all molecularbar-coding primers for each individual target and hence all resultingamplicons, but different from all other targets. In an embodiment, thetagging method described herein may be combined with a one-sided nestingprotocol.

In an embodiment, the design and generation of molecular barcodingprimers may be reduced to practice as follows: the molecular barcodingprimers may consist of a sequence that is not complementary to thetarget sequence followed by random molecular barcode region followed bya target specific sequence. The sequence 5′ of molecular barcode may beused for subsequence PCR amplification and may comprise sequences usefulin the conversion of the amplicon to a library for sequencing. Therandom molecular barcode sequence could be generated in a multitude ofways. The preferred method synthesize the molecule tagging primer insuch a way as to include all four bases to the reaction during synthesisof the barcode region. All or various combinations of bases may bespecified using the IUPAC DNA ambiguity codes. In this manner thesynthesized collection of molecules will contain a random mixture ofsequences in the molecular barcode region. The length of the barcoderegion will determine how many primers will contain unique barcodes. Thenumber of unique sequences is related to the length of the barcoderegion as N^(L) where N is the number of bases, typically 4, and L isthe length of the barcode. A barcode of five bases can yield up to 1024unique sequences; a barcode of eight bases can yield 65536 uniquebarcodes. In an embodiment, the DNA can be measured by a sequencingmethod, where the sequence data represents the sequence of a singlemolecule. This can include methods in which single molecules aresequenced directly or methods in which single molecules are amplified toform clones detectable by the sequence instrument, but that stillrepresent single molecules, herein called clonal sequencing.

In some embodiments, the molecular barcodes described herein areMolecular Index Tags (“MITs”), which are attached to a population ofnucleic acid molecules from a sample to identify individual samplenucleic acid molecules from the population of nucleic acid molecules(i.e. members of the population) after sample processing for asequencing reaction. MITs are described in detail in U.S. Pat. No.10,011,870 to Zimmermann et al., which is incorporated herein byreference in its entirety. Unlike prior art methods that relate tounique identifiers and teach having a diversity of unique identifiersthat is greater than the number of sample nucleic acid molecules in asample in order to tag each sample nucleic acid molecule with a uniqueidentifier, the present disclosure typically involves many more samplenucleic acid molecules than the diversity of MITs in a set of MITs. Infact, methods and compositions herein can include more than 1,000,1×10⁶, 1×10⁹, or even more starting molecules for each different MIT ina set of MITs. Yet the methods can still identify individual samplenucleic acid molecules that give rise to a tagged nucleic acid moleculeafter amplification.

In the methods and compositions herein, the diversity of the set of MITsis advantageously less than the total number of sample nucleic acidmolecules that span a target locus but the diversity of the possiblecombinations of attached MITs using the set of MITs is greater than thetotal number of sample nucleic acid molecules that span a target locus.Typically, to improve the identifying capability of the set of MITs, atleast two MITs are attached to a sample nucleic acid molecule to form atagged nucleic acid molecule. The sequences of attached MITs determinedfrom sequencing reads can be used to identify clonally amplifiedidentical copies of the same sample nucleic acid molecule that areattached to different solid supports or different regions of a solidsupport during sample preparation for the sequencing reaction. Thesequences of tagged nucleic acid molecules can be compiled, compared,and used to differentiate nucleotide mutations incurred duringamplification from nucleotide differences present in the initial samplenucleic acid molecules.

Sets of MITs in the present disclosure typically have a lower diversitythan the total number of sample nucleic acid molecules, whereas manyprior methods utilized sets of “unique identifiers” where the diversityof the unique identifiers was greater than the total number of samplenucleic acid molecules. Yet MITs of the present disclosure retainsufficient tracking power by including a diversity of possiblecombinations of attached MITs using the set of MITs that is greater thanthe total number of sample nucleic acid molecules that span a targetlocus. This lower diversity for a set of MITs of the present disclosuresignificantly reduces the cost and manufacturing complexity associatedwith generating and/or obtaining sets of tracking tags. Although thetotal number of MIT molecules in a reaction mixture is typically greaterthan the total number of sample nucleic acid molecules, the diversity ofthe set of MITs is far less than the total number of sample nucleic acidmolecules, which substantially lowers the cost and simplifies themanufacturability over prior art methods. Thus, a set of MIT's caninclude a diversity of as few as 3, 4, 5, 10, 25, 50, or 100 differentMITs on the low end of the range and 10, 25, 50, 100, 200, 250, 500, or1000 MITs on the high end of the range, for example. Accordingly, in thepresent disclosure this relatively low diversity of MITs results in afar lower diversity of MITs than the total number of sample nucleic acidmolecules, which in combination with a greater total number of MITs inthe reaction mixture than total sample nucleic acid molecules and ahigher diversity in the possible combinations of any 2 MITs of the setof MITs than the number of sample nucleic acid molecules that span atarget locus, provides a particularly advantageous embodiment that iscost-effective and very effective with complex samples isolated fromnature.

In some embodiments, the population of nucleic acid molecules has notbeen amplified in vitro before attaching the MITs and can includebetween 1×10⁸ and 1×10¹³, or in some embodiments, between 1×10⁹ and1×10¹² or between 1×10¹⁰ and 1×10¹², sample nucleic acid molecules. Insome embodiments, a reaction mixture is formed including the populationof nucleic acid molecules and a set of MITs, wherein the total number ofnucleic acid molecules in the population of nucleic acid molecules isgreater than the diversity of MITs in the set of MITs and wherein thereare at least three MITs in the set. In some embodiments, the diversityof the possible combinations of attached MITs using the set of MITs ismore than the total number of sample nucleic acid molecules that span atarget locus and less than the total number of sample nucleic acidmolecules in the population. In some embodiments, the diversity of setof MITs can include between 10 and 500 MITs with different sequences.The ratio of the total number of nucleic acid molecules in thepopulation of nucleic acid molecules in the sample to the diversity ofMITs in the set, in certain methods and compositions herein, can bebetween 1,000:1 and 1,000,000,000:1. The ratio of the diversity of thepossible combinations of attached MITs using the set of MITs to thetotal number of sample nucleic acid molecules that span a target locuscan be between 1.01:1 and 10:1. The MITs typically are composed at leastin part of an oligonucleotide between 4 and 20 nucleotides in length asdiscussed in more detail herein. The set of MITs can be designed suchthat the sequences of all the MITs in the set differ from each other byat least 2, 3, 4, or 5 nucleotides.

In some embodiments, provided herein, at least one (e.g. 2, 3, 5, 10,20, 30, 50, 100) MIT from the set of MITs are attached to each nucleicacid molecule or to a segment of each nucleic acid molecule of thepopulation of nucleic acid molecules to form a population of taggednucleic acid molecules. MITs can be attached to a sample nucleic acidmolecule in various configurations, as discussed further herein. Forexample, after attachment one MIT can be located on the 5′ terminus ofthe tagged nucleic acid molecules or 5′ to the sample nucleic acidsegment of some, most, or typically each of the tagged nucleic acidmolecules, and/or another MIT can be located 3′ to the sample nucleicacid segment of some, most, or typically each of the tagged nucleic acidmolecules. In other embodiments, at least two MITs are located 5′ and/or3′ to the sample nucleic acid segments of the tagged nucleic acidmolecules, or 5′ and/or 3′ to the sample nucleic acid segment of some,most, or typically each of the tagged nucleic acid molecules. Two MITscan be added to either the 5′ or 3′ by including both on the samepolynucleotide segment before attaching or by performing separatereactions. For example, PCR can be performed with primers that bind tospecific sequences within the sample nucleic acid molecules and includea region 5′ to the sequence-specific region that encodes two MITs. Insome embodiments, at least one copy of each MIT of the set of MITs isattached to a sample nucleic acid molecule, two copies of at least oneMIT are each attached to a different sample nucleic acid molecule,and/or at least two sample nucleic acid molecules with the same orsubstantially the same sequence have at least one different MITattached. A skilled artisan will identify methods for attaching MITs tonucleic acid molecules of a population of nucleic acid molecules. Forexample, MITs can be attached through ligation or appended 5′ to aninternal sequence binding site of a PCR primer and attached during a PCRreaction as discussed in more detail herein.

After or while MITs are attached to sample nucleic acids to form taggednucleic acid molecules, the population of tagged nucleic acid moleculesare typically amplified to create a library of tagged nucleic acidmolecules. Methods for amplification to generate a library, includingthose particularly relevant to a high-throughput sequencing workflow,are known in the art. For example, such amplification can be a PCR-basedlibrary preparation. These methods can further include clonallyamplifying the library of tagged nucleic acid molecules onto one or moresolid supports using PCR or another amplification method such as anisothermal method. Methods for generating clonally amplified librariesonto solid supports in high-throughput sequencing sample preparationworkflows are known in the art. Additional amplification steps, such asa multiplex amplification reaction in which a subset of the populationof sample nucleic acid molecules are amplified, can be included inmethods for identifying sample nucleic acids provided herein as well.

In some embodiments, a nucleotide sequence of the MITs and at least aportion of the sample nucleic acid molecule segments of some, most, orall (e.g. at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 25, 50, 75, 100, 150,200, 250, 500, 1,000, 2,500, 5,000, 10,000, 15,000, 20,000, 25,000,50,000, 100,000, 1,000,000, 5,000,000, 10,000,000, 25,000,000,50,000,000, 100,000,000, 250,000,000, 500,000,000, 1×10⁹, 1×10¹⁰,1×10¹¹, 1×10¹², or 1×10¹³ tagged nucleic acid molecules or between 10,20, 25, 30, 40, 50, 60, 70, 80, or 90% of the tagged nucleic acidmolecules on the low end of the range and 20, 25, 30, 40, 50, 60, 70,80, or 90, 95, 96, 97, 98, 99, and 100% on the high end of the range) ofthe tagged nucleic acid molecules in the library of tagged nucleic acidmolecules is then determined. The sequence of a first MIT and optionallya second MIT or more MITs on clonally amplified copies of a taggednucleic acid molecule can be used to identify the individual samplenucleic acid molecule that gave rise to the clonally amplified taggednucleic acid molecule in the library.

In some embodiments, sequences determined from tagged nucleic acidmolecules sharing the same first and optionally the same second MIT canbe used to identify amplification errors by differentiatingamplification errors from true sequence differences at target loci inthe sample nucleic acid molecules. For example, in some embodiments, theset of MITs are double stranded MITs that, for example, can be a portionof a partially or fully double-stranded adapter, such as a Y-adapter. Inthese embodiments, for every starting molecule, a Y-adapter preparationgenerates 2 daughter molecule types, one in a + and one in a −orientation. A true mutation in a sample molecule should have bothdaughter molecules paired with the same 2 MITs in these embodimentswhere the MITs are a double stranded adapter, or a portion thereof.Additionally, when the sequences for the tagged nucleic acid moleculesare determined and bucketed by the MITs on the sequences into MITnucleic acid segment families, considering the MIT sequence andoptionally its complement for double-stranded MITs, and optionallyconsidering at least a portion of the nucleic acid segment, most, andtypically at least 75% in double-stranded MIT embodiments, of thenucleic acid segments in an MIT nucleic acid segment family will includethe mutation if the starting molecule that gave rise to the taggednucleic acid molecules had the mutation. In the event of anamplification (e.g. PCR) error, the worst-case scenario is that theerror occurs in cycle 1 of the 1^(st) PCR. In these embodiments, anamplification error will cause 25% of the final product to contain theerror (plus any additional accumulated error, but this should be <<1%).Therefore, in some embodiments, if an MIT nucleic acid segment familycontains at least 75% reads for a particular mutation or polymorphicallele, for example, it can be concluded that the mutation orpolymorphic allele is truly present in the sample nucleic acid moleculethat gave rise to the tagged nucleic acid molecule. The later an erroroccurs in a sample preparation process, the lower the proportion ofsequence reads that include the error in a set of sequencing readsgrouped (i.e. bucketed) by MITs into a paired MIT nucleic acid segmentfamily. For example, an error in a library preparation amplificationwill result in a higher percentage of sequences with the error in apaired MIT nucleic acid segment family, than an error in a subsequentamplification step in the workflow, such as a targeted multiplexamplification. An error in the final clonal amplification in asequencing workflow creates the lowest percentage of nucleic acidmolecules in a paired MIT nucleic acid segment family that includes theerror.

In some embodiments disclosed herein, the ratio of the total number ofthe sample nucleic acid molecules to the diversity of the MITs in theset of MITs or the diversity of the possible combinations of attachedMITs using the set of MITs can be between 10:1, 20:1, 30:1, 40:1, 50:1,60:1, 70:1, 80:1, 90:1, 100:1 200:1, 300:1, 400:1, 500:1, 600:1, 700:1,800:1, 900:1, 1,000:1, 2,000:1, 3,000:1, 4,000:1, 5,000:1, 6,000:1,7,000:1, 8,000:1, 9,000:1, 10,000:1, 15,000:1, 20,000:1, 25,000:1,30,000:1, 40,000:1, 50,000:1, 60,000:1, 70,000:1, 80,000:1, 90,000:1,100,000:1, 200,000:1, 300,000:1, 400,000:1, 500,000:1, 600,000:1,700,000:1, 800,000:1, 900,000:1, and 1,000,000:1 on the low end of therange and 100:1 200:1, 300:1, 400:1, 500:1, 600:1, 700:1, 800:1, 900:1,1,000:1, 2,000:1, 3,000:1, 4,000:1, 5,000:1, 6,000:1, 7,000:1, 8,000:1,9,000:1, 10,000:1, 15,000:1, 20,000:1, 25,000:1, 30,000:1, 40,000:1,50,000:1, 60,000:1, 70,000:1, 80,000:1, 90,000:1, 100,000:1, 200,000:1,300,000:1, 400,000:1, 500,000:1, 600,000:1, 700,000:1, 800,000:1,900,000:1, 1,000,000:1, 2,000,000:1, 3,000,000:1, 4,000,000:1,5,000,000:1, 6,000,000:1, 7,000,000:1, 8,000,000:1, 9,000,000:1,10,000,000:1, 50,000,000:1, 100,000,000:1, and 1,000,000,000:1 on thehigh end of the range.

In some embodiments, the sample is a human cfDNA sample. In such amethod, as disclosed herein, the diversity is between about 20 millionand about 3 billion. In these embodiments, the ratio of the total numberof sample nucleic acid molecules to the diversity of the set of MITs canbe between 100,000:1, 1×10⁶:1, 1×10⁷:1, 2×10⁷:1, and 2.5×10⁷:1 on thelow end of the range and 2×10⁷:1, 2.5×10⁷:1, 5×10⁷:1, 1×10⁸:1,2.5×10⁸:1, 5×10⁸:1, and 1×10⁹:1 on the high end of the range.

In some embodiments, the diversity of possible combinations of attachedMITs using the set of MITs is preferably greater than the total numberof sample nucleic acid molecules that span a target locus. For example,if there are 100 copies of the human genome that have all beenfragmented into 200 bp fragments such that there are approximately15,000,000 fragments for each genome, then it is preferable that thediversity of possible combinations of MITs be greater than 100 (numberof copies of each target locus) but less than 1,500,000,000 (totalnumber of nucleic acid molecules). For example, the diversity ofpossible combinations of MITs can be greater than 100 but much less than1,500,000,000, such as 200, 300, 400, 500, 600, 700, 800, 900, or 1,000possible combinations of attached MITs. While the diversity of MITs inthe set of MITs is less than the total number of nucleic acid molecules,the total number of MITs in the reaction mixture is in excess of thetotal number of nucleic acid molecules or nucleic acid molecule segmentsin the reaction mixture. For example, if there are 1,500,000,000 totalnucleic acid molecules or nucleic acid molecule segments, then therewill be more than 1,500,000,000 total MIT molecules in the reactionmixture. In some embodiments, the ratio of the diversity of MITs in theset of MITs can be lower than the number of nucleic acid molecules in asample that span a target locus while the diversity of the possiblecombinations of attached MITs using the set of MITs can be greater thanthe number of nucleic acid molecules in the sample that span a targetlocus. For example, the ratio of the number of nucleic acid molecules ina sample that span a target locus to the diversity of MITs in the set ofMITs can be at least 10:1, 25:1, 50:1, 100:1, 125:1, 150:1, or 200:1 andthe ratio of the diversity of the possible combinations of attached MITsusing the set of MITs to the number of nucleic acid molecules in thesample that span a target locus can be at least 1.01:1, 1.1:1, 2:1, 3:1,4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 20:1, 25:1, 50:1, 100:1, 250:1,500:1, or 1,000:1.

Typically, the diversity of MITs in the set of MITs is less than thetotal number of sample nucleic acid molecules that span a target locuswhereas the diversity of the possible combinations of attached MITs isgreater than the total number of sample nucleic acid molecules that spana target locus. In embodiments where 2 MITs are attached to samplenucleic acid molecules, the diversity of MITs in the set of MITs is lessthan the total number of sample nucleic acid molecules that span atarget locus but greater than the square root of the total number ofsample nucleic acid molecules that span a target locus. In someembodiments, the diversity of MITs is less than the total number ofsample nucleic acid molecules that span a target locus but 1, 2, 3, 4,or 5 more than the square root of the total number of sample nucleicacid molecules that span a target locus. Thus, although the diversity ofMITs is less than the total number of sample nucleic acid molecules thatspan a target locus, the total number of combinations of any 2 MITs isgreater than the total number of sample nucleic acid molecules that spana target locus. The diversity of MITs in the set is typically less thanone half the number of sample nucleic acid molecules than span a targetlocus in samples with at least 100 copies of each target locus. In someembodiments, the diversity of MITs in the set can be at least 1, 2, 3,4, or 5 more than the square root of the total number of sample nucleicacid molecules that span a target locus but less than 1/5, 1/10, 1/20,1/50, or 1/100 the total number of sample nucleic acid molecules thatspan a target locus. For samples with between 2,000 and 1,000,000 samplenucleic acid molecules that span a target locus, the number of MITs inthe set does not exceed 1,000. For example, in a sample with 10,000copies of the genome in a genomic DNA sample such as a circulatingcell-free DNA sample such that the sample has 10,000 sample nucleic acidmolecules that span a target locus, the diversity of MITs can be between101 and 1,000, or between 101 and 500, or between 101 and 250. In someembodiments, the diversity of MITs in the set of MITs can be between thesquare root of the total number of sample nucleic acid molecules thatspan a target locus and 1, 10, 25, 50, 100, 125, 150, 200, 250, 300,400, 500, 600, 700, 800, 900, or 1,000 less than the total number ofsample nucleic acid molecules that span a target locus. In someembodiments, the diversity of MITs in the set of MITs can be between0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%,35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, and 80% of the number ofsample nucleic acid molecules that span a target locus on the low end ofthe range and 1%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%,45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%,and 99% of the number of sample nucleic acid molecules that span atarget locus on the high end of the range.

In some embodiments, the ratio of the total number of MITs in thereaction mixture to the total number of sample nucleic acid molecules inthe reaction mixture can be between 1.01, 1.1:1, 2:1, 3:1, 4:1, 5:1,6:1, 7:1, 8:1, 9:1, 10:1, 25:1 50:1, 100:1, 200:1, 300:1, 400:1, 500:1,600:1, 700:1, 800:1, 900:1, 1,000:1, 2,000:1, 3,000:1, 4,000:1, 5,000:1,6,000:1, 7,000:1, 8,000:1, 9,000:1, and 10,000:1 on the low end of therange and 25:150:1, 100:1, 200:1, 300:1, 400:1, 500:1, 600:1, 700:1,800:1, 900:1, 1,000:1, 2,000:1, 3,000:1, 4,000:1, 5,000:1, 6,000:1,7,000:1, 8,000:1, 9,000:1, 10,000:1, 15,000:1, 20,000:1, 25,000:1,30,000:1, 40,000:1, and 50,000:1 on the high end of the range. In someembodiments, the total number of MITs in the reaction mixture is atleast 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98% 99%, or 99.9% of thetotal number of sample nucleic acid molecules in the reaction mixture.In other embodiments, the ratio of the total number of MITs in thereaction mixture to the total number of sample nucleic acid molecules inthe reaction mixture can be at least enough MITs for each sample nucleicacid molecule to have the appropriate number of MITs attached, i.e. 2:1for 2 MITs being attached, 3:1 for 3 MITs, 4:1 for 4 MITs, 5:1 for 5MITs, 6:1 for 6 MITs, 7:1 for 7 MITs, 8:1 for 8 MITs, 9:1 for 0 MITs,and 10:1 for 10 MITs.

In some embodiments, the ratio of the total number of MITs withidentical sequences in the reaction mixture to the total number ofnucleic acid segments in the reaction mixture can be between 0.1:1,0.2:1, 0.3:1, 0.4:1, 0.5:1, 0.6:1, 0.7:1, 0.8:1, 0.9:1, 1:1, 1.1:1,1.2:1, 1.3:1, 1.4:1, 1.5:1, 1.6:1, 1.7:1, 1.8:1, 1.9:1, 2:1, 2.25:1,2.5:1, 2.75:1, 3:1, 3.5:1, 4:1, 4.5:1, and 5:1 on the low end of therange and 0.5:1, 0.6:1, 0.7:1, 0.8:1, 0.9:1, 1:1, 1.1:1, 1.2:1, 1.3:1,1.4:1, 1.5:1, 1.6:1, 1.7:1, 1.8:1, 1.9:1, 2:1, 2.25:1, 2.5:1, 2.75:1,3:1, 3.5:1, 4:1, 4.5:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 20:1, 30:1, 40:1,50:1, 60:1, 70:1, 80:1, 90:1, and 100:1 on the high end of the range.

The set of MITs can include, for example, at least three MITs or between10 and 500 MITs. As discussed herein in some embodiments, nucleic acidmolecules from the sample are added directly to the attachment reactionmixture without amplification. These sample nucleic acid molecules canbe purified from a source, such as a living cell or organism, asdisclosed herein, and then MITs can be attached without amplifying thenucleic acid molecules. In some embodiments, the sample nucleic acidmolecules or nucleic acid segments can be amplified before attachingMITs. As discussed herein, in some embodiments, the nucleic acidmolecules from the sample can be fragmented to generate sample nucleicacid segments. In some embodiments, other oligonucleotide sequences canbe attached (e.g. ligated) to the ends of the sample nucleic acidmolecules before the MITs are attached.

In some embodiments disclosed herein the ratio of sample nucleic acidmolecules, nucleic acid segments, or fragments that include a targetlocus to MITs in the reaction mixture can be between 1.01:1, 1.05,1.1:1, 1.2:1 1.3:1, 1.4:1, 1.5:1, 1.6:1, 1.7:1, 1.8:1, 1.9:1, 2:1,2.5:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 15:1, 20:1, 25:1, 30:1,35:1, 40:1, 45:1, and 50:1 on the low end and 5:1, 6:1, 7:1, 8:1, 9:1,10:1, 15:1, 20:1, 25:1, 30:1, 35:1, 40:1, 45:1, 50:1 60:1, 70:1, 80:1,90:1, 100:1, 125:1, 150:1, 175:1, 200:1, 300:1, 400:1 and 500:1 on thehigh end. For example, in some embodiments, the ratio of sample nucleicacid molecules, nucleic acid segments, or fragments with a specifictarget locus to MITs in the reaction mixture is between 5:1, 6:1, 7:1,8:1, 9:1, 10:1, 15:1, 20:1, 25:1, 30:1, 35:1, 40:1, 45:1, and 50:1 onthe low end and 20:1, 25:1, 30:1, 35:1, 40:1, 45:1, 50:1, 60:1, 70:1,80:1, 90:1, 100:1, and 200:1 on the high end. In some embodiments, theratio of sample nucleic acid molecules or nucleic acid segments to MITsin the reaction mixture can be between 25:1, 30:1, 35:1, 40:1, 45:1,50:1 on the low end and 50:1 60:1, 70:1, 80:1, 90:1, 100:1 on the highend. In some embodiments, the diversity of the possible combinations ofattached MITs can be greater than the number of sample nucleic acidmolecules, nucleic acid segments, or fragments that span a target locus.For example, in some embodiments, the ratio of the diversity of thepossible combinations of attached MITs to the number of sample nucleicacid molecules, nucleic acid segments, or fragments that span a targetlocus can be at least 1.01, 1.1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1,9:1, 10:1, 20:1, 25:1, 50:1, 100:1, 250:1, 500:1, or 1,000:1.

Reaction mixtures for tagging nucleic acid molecules with MITs (i.e.attaching nucleic acid molecules to MITs), as provided herein, caninclude additional reagents in addition to a population of samplenucleic acid molecules and a set of MITs. For example, the reactionmixtures for tagging can include a ligase or polymerase with suitablebuffers at an appropriate pH, adenosine triphosphate (ATP) forATP-dependent ligases or nicotinamide adenine dinucleotide forNAD-dependent ligases, deoxynucleoside triphosphates (dNTPs) forpolymerases, and optionally molecular crowding reagents such aspolyethylene glycol. In certain embodiments the reaction mixture caninclude a population of sample nucleic acid molecules, a set of MITs,and a polymerase or ligase, wherein the ratio of the number of samplenucleic acid molecules, nucleic acid segments, or fragments with aspecific target locus to the number of MITs in the reaction mixture canbe any of the ratios disclosed herein, for example between 2:1 and100:1, or between 10:1 and 100:1 or between 25:1 and 75:1, or is between40:1 and 60:1, or between 45:1 and 55:1, or between 49:1 and 51:1.

In some embodiments disclosed herein the number of different MITs (i.e.diversity) in the set of MITs can be between 3, 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70,80, 90, 100, 125, 150, 175, 200, 250, 300, 350, 400, 450, 500, 600, 700,800, 900, 1,000, 1,500, 2,000, 2,500, and 3,000 MITs with differentsequences on the low end and 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125,150, 175, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1,000,2,000, 3,000, 4,000, and 5,000 MITs with different sequences on the highend. For example, the diversity of different MITs in the set of MITs canbe between 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, and 100 differentMIT sequences on the low end and 50, 60, 70, 80, 90, 100, 125, 150, 175,200, 250, and 300 different MIT sequences on the high end. In someembodiments, the diversity of different MITs in the set of MITs can bebetween 50, 60, 70, 80, 90, 100, 125, and 150 different MIT sequences onthe low end and 100, 125, 150, 175, 200, and 250 different MIT sequenceson the high end. In some embodiments, the diversity of different MITs inthe set of MITs can be between 3 and 1,000, or 10 and 500, or 50 and 250different MIT sequences. In some embodiments, the diversity of possiblecombinations of attached MITs using the set of MITs can be between 4, 5,6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 200, 250, 300,400, 500, and 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000,9,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000,90,000, 100,000, 250,000, 500,000, 1,000,000, possible combinations ofattached MITs on the low end of the range and 10, 15, 20, 25, 30, 40,50, 75, 100, 150, 200, 250, 300, 400, 500, 1,000, 2,000, 3,000, 4,000,5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 20,000, 30,000, 40,000,50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 250,000, 500,000,1,000,000, 2,000,000, 3,000,000, 4,000,000, 5,000,000, 6,000,000,7,000,000, 8,000,000, 9,000,000, and 10,000,000 possible combinations ofattached MITs on the high end of the range.

The MITs in the set of MITs are typically all the same length. Forexample, in some embodiments, the MITs can be any length between 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, and 20nucleotides on the low end and 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, and 30nucleotides on the high end. In certain embodiments, the MITs are anylength between 3, 4, 5, 6, 7, or 8 nucleotides on the low end and 5, 6,7, 8, 9, 10, or 11 nucleotides on the high end. In some embodiments, thelengths of the MITs can be any length between 4, 5, or 6, nucleotides onthe low end and 5, 6, or 7 nucleotides on the high end. In someembodiments, the length of the MITs is 5, 6, or 7 nucleotides.

As will be understood, a set of MITs typically includes many identicalcopies of each MIT member of the set. In some embodiments, a set of MITsincludes between 10, 20, 25, 30, 40, 50, 100, 500, 1,000, 10,000,50,000, and 100,000 times more copies on the low end of the range, and100, 500, 1,000, 10,000, 50,000, 100,000, 250,000, 500,000 and 1,000,000more copies on the high end of the range, than the total number ofsample nucleic acid molecules that span a target locus. For example, ina human circulating cell-free DNA sample isolated from plasma, there canbe a quantity of DNA fragments that includes, for example, 1,000-100,000circulating fragments that span any target locus of the genome. Incertain embodiments, there are no more than 1/10, ¼, ½, or ¾ as manycopies of any given MIT as total unique MITs in a set of MITs. Betweenmembers of the set, there can be 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10differences between any sequence and the rest of the sequences. In someembodiments, the sequence of each MIT in the set differs from all theother MITs by at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides. Toreduce the chance of misidentifying an MIT, the set of MITs can bedesigned using methods a skilled artisan will recognize, such as takinginto consideration the Hamming distances between all the MITs in the setof MITs. The Hamming distance measures the minimum number ofsubstitutions required to change one string, or nucleotide sequence,into another. Here, the Hamming distance measures the minimum number ofamplification errors required to transform one MIT sequence in a setinto another MIT sequence from the same set. In certain embodiments,different MITs of the set of MITs have a Hamming distance of less than1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 between each other.

In certain embodiments, a set of isolated MITs as provided herein is oneembodiment of the present disclosure. The set of isolated MITs can be aset of single stranded, or partially, or fully double stranded nucleicacid molecules, wherein each MIT is a portion of, or the entire, nucleicacid molecule of the set. In certain examples, provided herein is a setof Y-adapter (i.e. partially double-stranded) nucleic acids that eachinclude a different MIT. The set of Y-adapter nucleic acids can each beidentical except for the MIT portion. Multiple copies of the sameY-adapter MIT can be included in the set. The set can have a number anddiversity of nucleic acid molecules as disclosed herein for a set ofMITs. As a non-limiting example, the set can include 2, 5, 10, or 100copies of between 50 and 500 MIT-containing Y-adapters, with each MITsegment between 4 and 8 nucleic acids in length and each MIT segmentdiffering from the other MIT segments by at least 2 nucleotides, butcontain identical sequences other than the MIT sequence. Further detailsregarding Y-adapter portion of the set of Y-adapters is provided herein.

In other embodiments, a reaction mixture that includes a set of MITs anda population of sample nucleic acid molecules is one embodiment of thepresent disclosure. Furthermore, such a composition can be part ofnumerous methods and other compositions provided herein. For example, infurther embodiments, a reaction mixture can include a polymerase orligase, appropriate buffers, and supplemental components as discussed inmore detail herein. For any of these embodiments, the set of MITs caninclude between 25, 50, 100, 200, 250, 300, 400, 500, or 1,000 MITs onthe low end of the range, and 100, 200, 250, 300, 400, 500, 1,000,1,500, 2,000, 2,500, 5,000, 10,000, or 25,000 MITs on the high end ofthe range. For example, in some embodiments, a reaction mixture includesa set of between 10 and 500 MITs.

Molecular Index Tags (MITs) as discussed in more detail herein can beattached to sample nucleic acid molecules in the reaction mixture usingmethods that a skilled artisan will recognize. In some embodiments, theMITs can be attached alone, or without any additional oligonucleotidesequences. In some embodiments, the MITs can be part of a largeroligonucleotide that can further include other nucleotide sequences asdiscussed in more detail herein. For example, the oligonucleotide canalso include primers specific for nucleic acid segments or universalprimer binding sites, adapters such as sequencing adapters such asY-adapters, library tags, ligation adapter tags, and combinationsthereof. A skilled artisan will recognize how to incorporate varioustags into oligonucleotides to generate tagged nucleic acid moleculesuseful for sequencing, especially high-throughput sequencing. The MITsof the present disclosure are advantageous in that they are more readilyused with additional sequences, such as Y-adapter and/or universalsequences because the diversity of nucleic acid molecules is less, andtherefore they can be more easily combined with additional sequences onan adapter to yield a smaller, and therefore more cost effective set ofMIT-containing adapters.

In some embodiments, the MITs are attached such that one MIT is 5′ tothe sample nucleic acid segment and one MIT is 3′ to the sample nucleicacid segment in the tagged nucleic acid molecule. For example, in someembodiments, the MITs can be attached directly to the 5′ and 3′ ends ofthe sample nucleic acid molecules using ligation. In some embodimentsdisclosed herein, ligation typically involves forming a reaction mixturewith appropriate buffers, ions, and a suitable pH in which thepopulation of sample nucleic acid molecules, the set of MITs, adenosinetriphosphate, and a ligase are combined. A skilled artisan willunderstand how to form the reaction mixture and the various ligasesavailable for use. In some embodiments, the nucleic acid molecules canhave 3′ adenosine overhangs and the MITs can be located ondouble-stranded oligonucleotides having 5′ thymidine overhangs, such asdirectly adjacent to a 5′ thymidine.

In further embodiments, MITs provided herein can be included as part ofY-adapters before they are ligated to sample nucleic acid molecules.Y-adapters are well-known in the art and are used, for example, to moreeffectively provide primer binding sequences to the two ends of thenucleic acid molecules before high-throughput sequencing. Y-adapters areformed by annealing a first oligonucleotide and a second oligonucleotidewhere a 5′ segment of the first oligonucleotide and a 3′ segment of thesecond oligonucleotide are complementary and wherein a 3′ segment of thefirst oligonucleotide and a 5′ segment of the second oligonucleotide arenot complementary. In some embodiments, Y-adapters include abase-paired, double-stranded polynucleotide segment and an unpaired,single-stranded polynucleotide segment distal to the site of ligation.The double-stranded polynucleotide segment can be between 3, 4, 5, 6, 7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides inlength on the low end of the range and 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, and 30nucleotides in length on the high end of the range. The single-strandedpolynucleotide segments on the first and second oligonucleotides can bebetween 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or20 nucleotides in length on the low end of the range and 4, 5, 6, 7, 8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,27, 28, 29, and 30 nucleotides in length on the high end of the range.In these embodiments, MITs are typically double stranded sequences addedto the ends of Y-adapters, which are ligated to sample nucleic acidsegments to be sequenced. In some embodiments, the non-complementarysegments of the first and second oligonucleotides can be differentlengths.

In some embodiments, double-stranded MITs attached by ligation will havethe same MIT on both strands of the sample nucleic acid molecule. Incertain aspects the tagged nucleic acid molecules derived from these twostrands will be identified and used to generate paired MIT families. Indownstream sequencing reactions, where single stranded nucleic acids aretypically sequenced, an MIT family can be identified by identifyingtagged nucleic acid molecules with identical or complementary MITsequences. In these embodiments, the paired MIT families can be used toverify the presence of sequence differences in the initial samplenucleic acid molecule as discussed herein.

In some embodiments, MITs can be attached to the sample nucleic acidsegment by being incorporated 5′ to forward and/or reverse PCR primersthat bind sequences in the sample nucleic acid segment. In someembodiments, the MITs can be incorporated into universal forward and/orreverse PCR primers that bind universal primer binding sequencespreviously attached to the sample nucleic acid molecules. In someembodiments, the MITs can be attached using a combination of a universalforward or reverse primer with a 5′ MIT sequence and a forward orreverse PCR primer that bind internal binding sequences in the samplenucleic acid segment with a 5′ MIT sequence. After 2 cycles of PCR,sample nucleic acid molecules that have been amplified using both theforward and reverse primers with incorporated MIT sequences will haveMITs attached 5′ to the sample nucleic acid segments and 3′ to thesample nucleic acid segments in each of the tagged nucleic acidmolecules. In some embodiments, the PCR is done for 2, 3, 4, 5, 6, 7, 8,9, or 10 cycles in the attachment step.

In some embodiments disclosed herein the two MITs on each tagged nucleicacid molecule can be attached using similar techniques such that bothMITs are 5′ to the sample nucleic acid segments or both MITs are 3′ tothe sample nucleic acid segments. For example, two MITs can beincorporated into the same oligonucleotide and ligated on one end of thesample nucleic acid molecule or two MITs can be present on the forwardor reverse primer and the paired reverse or forward primer can have zeroMITs. In other embodiments, more than two MITs can be attached with anycombination of MITs attached to the 5′ and/or 3′ locations relative tothe nucleic acid segments.

As discussed herein, other sequences can be attached to the samplenucleic acid molecules before, after, during, or with the MITs. Forexample, ligation adapters, often referred to as library tags orligation adaptor tags (LTs), appended, with or without a universalprimer binding sequence to be used in a subsequent universalamplification step. In some embodiments, the length of theoligonucleotide containing the MITs and other sequences can be between5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 29, 20, 21, 22, 23,24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85,90, 95, and 100 nucleotides on the low end of the range and 10, 11, 12,13, 14, 15, 16, 17, 18, 29, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130,140, 150, 160, 170, 180, 190, and 200 nucleotides on the high end of therange. In certain aspects the number of nucleotides in the MIT sequencescan be a percentage of the number of nucleotides in the total sequenceof the oligonucleotides that include MITs. For example, in someembodiments, the MIT can be at most 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%,11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%,45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100% of thetotal nucleotides of an oligonucleotide that is ligated to a samplenucleic acid molecule.

After attaching MITs to the sample nucleic acid molecules through aligation or PCR reaction, it may be necessary to clean up the reactionmixture to remove undesirable components that could affect subsequentmethod steps. In some embodiments, the sample nucleic acid molecules canbe purified away from the primers or ligases. In other embodiments, theproteins and primers can be digested with proteases and exonucleasesusing methods known in the art.

After attaching MITs to the sample nucleic acid molecules, a populationof tagged nucleic acid molecules is generated, itself formingembodiments of the present disclosure. In some embodiments, the sizeranges of the tagged nucleic acid molecules can be between 10, 20, 30,40, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 250, 300, 400, and 500nucleotides on the low end of the range and 100, 125, 150, 175, 200,250, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000, 4,000, and5,000 nucleotides on the high end of the range.

Such a population of tagged nucleic acid molecules can include between5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35,40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140,150, 160, 170, 180, 190, 200, 225, 250, 300, 350, 400, 450, 500, 600,700, 800, 900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000,9,000, 10,000, 15,000, 20,000, 30,000, 40,000, 50,000, 100,000, 200,000,300,000, 400,000, 500,000, 600,000, 700,000, 800,000, 900,000,1,000,000, 1,250,000, 1,500,000, 2,000,000, 2,500,000, 3,000,000,4,000,000, 5,000,000, 10,000,000, 20,000,000, 30,000,000, 40,00,000,50,000,000, 50,000,000, 100,000,000, 200,000,000, 300,000,000,400,000,000, 500,000,000, 600,000,000, 700,000,000, 800,000,000,900,000,000, and 1,000,000,000 tagged nucleic acid molecules on the lowend of the range and 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100,150, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000,4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 15,000, 20,000,30,000, 40,000, 50,000, 100,000, 200,000, 300,000, 400,000, 500,000,600,000, 700,000, 800,000, 900,000, 1,000,000, 1,250,000, 1,500,000,2,000,000, 2,500,000, 3,000,000, 4,000,000, 5,000,000, 6,000,000,7,000,000, 8,000,000, 9,000,000, 10,000,000, 20,000,000, 30,000,000,40,00,000, 50,000,000, 100,000,000, 200,000,000, 300,000,000,400,000,000, 500,000,000, 600,000,000, 700,000,000, 800,000,000,900,000,000, 1,000,000,000, 2,000,000,000, 3,000,000,000, 4,000,000,000,5,000,000,000, 6,000,000,000, 7,000,000,000, 8,000,000,000,9,000,000,000, and 10,000,000,000, tagged nucleic acid molecules on thehigh end of the range. In some embodiments, the population of taggednucleic acid molecules can include between 100,000,000, 200,000,000,300,000,000, 400,000,000, 500,000,000, 600,000,000, 700,000,000,800,000,000, 900,000,000, and 1,000,000,000 tagged nucleic acidmolecules on the low end of the range and 500,000,000, 600,000,000,700,000,000, 800,000,000, 900,000,000, 1,000,000,000, 2,000,000,000,3,000,000,000, 4,000,000,000, 5,000,000,000 tagged nucleic acidmolecules on the high end of the range.

In certain aspects a percentage of the total sample nucleic acidmolecules in the population of sample nucleic acid molecules can betargeted to have MITs attached. In some embodiments, at least 1%, 2%,3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%,55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or99.9% of the sample nucleic acid molecules can be targeted to have MITsattached. In other aspects a percentage of the sample nucleic acidmolecules in the population can have MITs successfully attached. In anyof the embodiments disclosed herein at least 1%, 2%, 3%, 4%, 5%, 6%, 7%,8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%,75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 99.9% of the samplenucleic acid molecules can have MITs successfully attached to form thepopulation of tagged nucleic acid molecules. In any of the embodimentsdisclosed herein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30,40, 50, 75, 100, 200, 300, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000,4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 15,000, 20,000,30,000, 40,000, or 50,000 of the sample nucleic acid molecules can haveMITs successfully attached to form the population of tagged nucleic acidmolecules.

In some embodiments disclosed herein, MITs can be oligonucleotidesequences of ribonucleotides or deoxyribonucleotides linked throughphosphodiester linkages. Nucleotides as disclosed herein can refer toboth ribonucleotides and deoxyribonucleotides and a skilled artisan willrecognize when either form is relevant for a particular application. Incertain embodiments, the nucleotides can be selected from the group ofnaturally-occurring nucleotides consisting of adenosine, cytidine,guanosine, uridine, 5-methyluridine, deoxyadenosine, deoxycytidine,deoxyguanosine, deoxythymidine, and deoxyuridine. In some embodiments,the MITs can be non-natural nucleotides. Non-natural nucleotides caninclude: sets of nucleotides that bind to each other, such as, forexample, d5SICS and dNaM; metal-coordinated bases such as, for example,2,6-bis(ethylthiomethyl)pyridine (SPy) with a silver ion and mondentatepyridine (Py) with a copper ion; universal bases that can pair with morethan one or any other base such as, for example, 2′-deoxyinosinederivatives, nitroazole analogues, and hydrophobic aromaticnon-hydrogen-bonding bases; and xDNA nucleobases with expanded bases. Incertain embodiments, the oligonucleotide sequences can be pre-determinedwhile in other embodiments, the oligonucleotide sequences can bedegenerate.

In some embodiments, MITs include phosphodiester linkages between thenatural sugars ribose and/or deoxyribose that are attached to thenucleobase. In some embodiments, non-natural linkages can be used. Theselinkages include, for example, phosphorothioate, boranophosphate,phosphonate, and triazole linkages. In some embodiments, combinations ofthe non-natural linkages and/or the phosphodiester linkages can be used.In some embodiments, peptide nucleic acids can be used wherein the sugarbackbone is instead made of repeating N-(2-aminoethyl)-glycine unitslinked by peptide bonds. In any of the embodiments disclosed hereinnon-natural sugars can be used in place of the ribose or deoxyribosesugar. For example, threose can be used to generatea-(L)-threofuranosyl-(3′-2′) nucleic acids (TNA). Other linkage typesand sugars will be apparent to a skilled artisan and can be used in anyof the embodiments disclosed herein.

In some embodiments, nucleotides with extra bonds between atoms of thesugar can be used. For example, bridged or locked nucleic acids can beused in the MITs. These nucleic acids include a bond between the2′-position and 4′-position of a ribose sugar.

In certain embodiments, the nucleotides incorporated into the sequenceof the MIT can be appended with reactive linkers. At a later time, thereactive linkers can be mixed with an appropriately-tagged molecule insuitable conditions for the reaction to occur. For example, aminoallylnucleotides can be appended that can react with molecules linked to areactive leaving group such as succinimidyl ester and thiol-containingnucleotides can be appended that can react with molecules linked to areactive leaving group such as maleimide. In other embodiments,biotin-linked nucleotides can be used in the sequence of the MIT thatcan bind streptavidin-tagged molecules.

Various combinations of the natural nucleotides, non-naturalnucleotides, phosphodiester linkages, non-natural linkages, naturalsugars, non-natural sugars, peptide nucleic acids, bridged nucleicacids, locked nucleic acids, and nucleotides with appended reactivelinkers will be recognized by a skilled artisan and can be used to formMITs in any of the embodiments disclosed herein.

Error Modeling

Referring now to FIG. 8, an illustration of a base-specific analysis anda motif-specific analysis of a sample are shown. The conventionalapproach includes at least four steps: determining a set of specifictargets to assay (BLOCK 110), running a large number of test assays onthe specific targets to generate target-specific statistics (BLOCK 112),sequencing a sample (BLOCK 114), and calling mutations for the specifictargets using the generated statistics (BLOCK 116).

At BLOCK 110, a set of specific targets to be assayed is determine.Calling mutation using the conventional approach shown in FIG. 8 islimited to calling mutations for the specific targets determined atBLOCK 110. At BLOCK 112, dozens or hundreds of test assays may beperformed for each target of interest (each target determined in BLOCK110) to generate test data. For example, the test assays may includeperforming a PCR process on genetic segments extracted from a testsample. The amplified result of the PCR process may be exhaustivelysequenced to generate background error statistics. For example, errorsor mutations detected in the amplified result may be ascribed to errorsinduced by the PCR process, and a PCR propagation error rate may beestimated for the genetic sequences being assayed. A large number oftest assays may be performed for each specific target to improve theestimate of the PCR propagation error rate.

At BLOCK 114, a genetic sample can be sequenced, and at BLOCK 116mutations can be called using the determined PCR propagation error rateto account for at least some background error, and/or using otherstatistics generated at BLOCK 112. Mutations can only be called for thespecific targets for which statistics were generated at BLOCK 112. Thus,to call mutations for a large number of targets of the sequenced sample,a very large number of test assays are performed, which can be expensiveand time consuming.

The motif-specific approach improves on the conventional approach byproviding for omission of the large number of target-specific testassays. Instead of generating target-specific statistics, an error modelthat provides for motif-specific statistics is used, which can beapplied in a more general manner than can the target-specific approach(e.g. can be applied to any target having a same or similar motif as amotif used to generate test statistics). At BLOCK 120, using the methodsand systems described herein, motif-specific statistics can begenerated, which can constitute, or be used as part of, a motif-specificerror model. Once a motif-specific error model has been established, themotif-specific approach can be implemented by sequencing a sample atBLOCK 122 and by calling mutations to targets having a specific motifusing the motif-specific error model at BLOCK 124. The motif-specificerror model has wide applicability. For example, a new sample can differin at least some regards from a training sample used to generate themotif-specific error model, and it may be desirable to sequence targetsfor which no target-specific statistics exist (or for which existentstatistics have an unacceptably or undesirably high degree ofuncertainty). By using the motif-specific approach that leverages thetendency of background error to be motif-specific, the motif-specificerror model can provide for accurate estimates of error associated withtarget bases in a sample that have a same motif as was analyzed andincorporated into the motif-specific error model, even though the targetbases may be at different positions than the bases included in thetraining data used to generate the motif-specific error model. Thus, alarge number of motif-specific test assays need not be performed foreach sequencing and calling process for a sample to be sequenced. Themotif-specific approach provides for accurate estimates of expectedbackground error, which in turn can provide for highly accurate callingof mutations.

The present disclosure describes systems and methods that can be used toimplement the motif-specific approach described above. The presentdisclosure describes statistical models, algorithms, and theirimplementation (e.g. for recurrence monitoring (RM)). RM can detecttumor specific mutations (targets) in a subject's plasma that arecontributed by circulating tumor DNA (ctDNA). For that purpose targetedsequencing of a subject's plasma sample can be employed. Denoting thenumber of reads for a mutation at a certain position by E and the totalnumber of reads at this position by X, and assuming that E comes from aBeta-Binomial distribution with parameters X and p(α, β)

E˜BB(X,p(α,β))  (1)

where p comes from Beta distribution with parameters α and β that arefunctions of replication efficiency and background error specific tosample preparation, these parameters can be estimated from a set oftraining samples with no mutations. In addition, these parameters areconsidered to be dependent on the fraction of ctDNA having the mutation,also called the real error as opposed to the background error of the PCRprocess generated in sample preparation. Since the fraction of ctDNApresent in the plasma sample may be unknown, α and β can be evaluated ona grid of values, and a mutation fraction that produces the highestprobability for the data can be selected.

Training or Sample Data Preparation

In some RM applications, samples are prepared in the lab in the courseof two separate PCR reactions. After each reaction, only a portion ofthe product is passed to the next stage. This may be referred to assubsampling. To simplify computations, the present disclosure model theprocess by one PCR reaction with combined subsampling as illustrated inFIG. 9.

Some example implementations consider a total sub-sampling rate of6×10⁻⁵ to model the process. The model assumes that a) the replicationrate, or efficiency, p is constant from cycle to cycle; b) error rateP_(e) is small compared to replication rate; c) an error occurs onlyonce in the replication process, meaning that if a nucleotide base issubstituted by another it will keep replicating unchanged for the restof the process.

Number of PCR Cycles

An RM variant calling algorithm estimates random SNV or indel error rateduring the PCR reaction. The resulting frequency of PCR inducedmutations depends on the number of PCR cycles that sample goes through.The number of cycles increases dynamically for samples with low initialDNA amounts as the saturation is reached later. Only the librarypreparation PCR reaction is affected by variable number of cycles. Thestarcoding reaction (targeted amplification and barcoding) is assumed tohave the same number of cycles. Therefore, the total number of cycles isgiven by n_(total)=n_(libprep)+n_(starcoding). Based on the DNA inputamount to library preparation step the algorithm estimates the totalnumber of cycles to compute the expected PCR error more accurately. Thenumber of cycles during library preparation is computed assuming thefollowingstarting_copies*(1+p)^(nlibprep)*libprep_loss=libprep_output_copies,where p is replication efficiency taken to be 0.9, libprep_loss is 0.75,libprep_output_copies=3*10⁶, and

${starting\_ copies}{{= \frac{x_{input}}{{3.3}*10^{- 3}}},}$

where x_(input) is the DNA input amount in nanograms (ng). Then_(starcoding) is calibrated from the data to generated 10⁴ startingcopies for samples with 33 ng input amount.

Estimating a Mutation Fraction Distribution and Parameters

Estimating the above mentioned parameters α and β from the expectationand variance of the error rate can be implemented as follows. If μ isthe expectation of the error rate after the PCR process and var is itsvariance as in

$\begin{matrix}{\mu = {\left( \frac{E}{X} \right)}} & (2) \\{{var} = {\left( \frac{E}{X} \right)}} & (3)\end{matrix}$

then α and β of the corresponding Beta distribution are computed as

$\begin{matrix}{\alpha = {{\mu^{2}\frac{1 - \mu}{var}} - \mu}} & (4) \\{\beta = {{\alpha \frac{1}{\mu}} - 1}} & (5)\end{matrix}$

The following expansion can be used to estimate p and var

$\begin{matrix}{\mu = {{\left( \frac{E}{X} \right)} \approx {\frac{(E)}{(X)} - \frac{{Cov}\left( {E,X} \right)}{\left( {(X)} \right)^{2}} + \frac{{(E)}{(X)}}{\left( {(X)} \right)^{3}}}}} & (6) \\\begin{matrix}{{var} = {\left( \frac{E}{X} \right)}} \\{\approx {\frac{(E)}{\left( {(X)} \right)^{2}} - \frac{2{(E)}{{Cov}\left( {E,X} \right)}}{\left( {(X)} \right)^{3}} + \frac{\left( {(E)} \right)^{2}{(X)}}{\left( {(X)} \right)^{4}}}}\end{matrix} & (7)\end{matrix}$

Here, as defined above, X is the total number of reads and E is thenumber of reads for an error base, meaning the base that is differentfrom the reference base. Since there are three possible changes from thereference (e.g. A can change to T, C, or G), there will be threeexpected error rates, one per each mutant base, or channel. The totalerror counts come from at least two sources—mutation in tumor DNA thatis present before replication process and an erroneous substitutionduring the PCR process used in sample preparation. The former isreferred to as the real error, and the latter as the background error.

E=E ^(r) +E ^(b)  (8)

To determine a mutation fraction, or a probability distribution thereof,the replication efficiency and the probability of the background errorper cycle is estimated from a set of training samples that are notexpected to have any real mutations. Then, the starting count (orstarting copy) is estimated based on the PCR efficiency. Using thisestimate, the expectation and variance of total and error counts afterthe PCR process are computed, and can be plugged into Equations 6 and 7.Then, using Equations 4 and 5, the mutation fraction distributionparameters α and β can be determined.

Modeling of the PCR Process and Useful Formulas

Assuming that at each PCR cycle n a) new DNA molecules are generatedfrom the molecules present at the end of the previous cycle n−1 asgoverned by a binomial random process; b) molecules with a backgrounderror come from replication of errors from the previous cycle and newerrors that occur at the current cycle randomly according the binomialrandom process with probability of error p_(e), having zero backgrounderrors present at the beginning of the PCR process; c) replication erroroccurs once per molecule and is not reversible; d) real errors arereplicated with the same efficiency as normal molecules and theirinitial quantity is a fraction of the total molecules (e.g. if thestarting copy is denoted by X₀ then there are f X₀ mutant moleculesamong them), then

X _(n) −X _(n-1) ˜B(X _(n-1) ,p)

E _(n) ^(b) −E _(n-1) ^(b) ˜B((X _(n-1) −E _(n-1) ^(b)),p _(e))+B(E_(n-1) ^(b) ,p)

E ₀ ^(r) =fX ₀  (9)

Several values off can be considered to find one that fits the databest.

1. Expectation and Variance of Total Reads

From Equations 9, the expectation of the number of total readsconditioned on replication efficiency is given by

(X _(n) |p)=

(X _(n-1) |p)+p|

(X _(n-1) |p)=(1+p)^(n)

(X ₀)  (10)

The variance of this variable is given by

(X _(n) |p)=p(1−p)

(X _(n-1) |p)+

(X _(n-1) |p)=(1−p)(1+p)^(n−1)((1+p)^(n)−1)

(X ₀)+(1+p)^(2n)

(X ₀)  (11)

Here the last equality in each equation is produced by solving therecursive relation from the first part of the equation.

2. Expectation and Variance for the Real Error Reads

Similarly to the total number of reads, for the real error the followingequations apply:

$\begin{matrix}{{\left. \mspace{79mu} {{{\left( E_{n}^{r} \middle| p \right)} = {{f\left( {1 + p} \right)}^{n}{\left( X_{0} \right)}}}{{\left( {E_{n}^{r}p} \right)} = {{f\left( {1 - p} \right)}\left( {1 + p} \right)^{n - 1}\left( {\left( {1 + p} \right)^{n} - 1} \right)}}} \right){\left( X_{0} \right)}} + {{f^{2}\left( {1 + p} \right)}^{2n}{\left( X_{0} \right)}}} & (12)\end{matrix}$

3. Expectation and Variance for Background Error

For the sake of shortening the notations, in this section explicitreference to conditioning on p is omitted, but the statistics areconditional on p.

Expectation of Background Error Reads

From Equations 9:

(E _(n) ^(b) |E _(n-1) X _(n-1))=(1+p)E _(n-1) ^(b) +p _(e)(X _(n-1) −E_(n-1) ^(b))

which gives

(E _(n) ^(b))=(1+p−p _(e))

(E _(n-1) ^(b))+p _(e)

(X _(n-1))=(1+p−p _(e))

(E _(n-1) ^(b))+p _(e)(1+p)^(n−1)

(X ₀)

where Equation 10 was used. Solving the recursive relation provides

$\begin{matrix}{{\left( E_{n}^{b} \right)} = {\left( {\left( {1 + p} \right)^{n} - \left( {1 + p - p_{e}} \right)^{n}} \right){\left( X_{0} \right)}}} \\{= {\left( {1 + p} \right)^{n}\left( {1 - \left( {1 - \frac{p_{e}}{1 + p}} \right)^{n}} \right){\left( X_{0} \right)}}}\end{matrix}\quad$

For subsequent derivations, the approximation of this expression thatcomes from the equation above under the assumption that p_(e)«p is used

(E _(n) ^(b) ≈np _(e)(1+p)^(n−1)

(X ₀)  (13)

Variance of Background Error Reads

Some intermediate expressions that will be used in the followingderivation are as follows:

(E _(n) ^(b) |E _(n-1) ^(b) X _(n-1))=(1+p−−p _(e))E _(n-1) ^(b) +p _(e)X _(n-1)  (14)

(E _(n) ^(b) |E _(n-1) ^(b) X _(n-1))=(p(1−p)−p _(e)(1−p _(e)(1−p_(e)))E _(n-1) ^(b) +p _(e)(1−p _(e))X _(n-1)  (15)

These follow directly from Equation 9. In deriving the last equation,the fact that Cov(B(E_(n) ^(b),p), B(X_(n)−E_(n) ^(b),p_(e))=0 was used.

With these, the variance term for the background error can be written as

(E_(n)^(b)) = ((E_(n)^(b)|E_(n − 1)^(b)X_(n − 1))) + ((E_(n)^(b)|E_(n − 1)^(b)X_(n − 1))) =  = ((p(1 − p) − p_(e)(1 − p_(e)))E_(n − 1)^(b) + (p_(e)(1 − p_(e))X_(n − 1)) + +((1 + p − p_(e)))E_(n − 1)^(b) + p_(e)X_(n − 1) =  = (p(1 − p) − p_(e)(1 − p_(e)))(E_(n − 1)^(b)) + p_(e)(1 − p_(e))(X_(n − 1)) + +p_(e)²(X_(n − 1)) + 2p_(e)(1 + p − p_(e))Cov(E_(n − 1)^(b), X_(n − 1)) + (1 + p − p_(e))²(E_(n − 1)^(b))

In the last equation, all terms except the last two have been computed.The very last term is used in a recursive relation that can provide thesolution for variance. Thus the only term left to compute is thecovariance.

The covariance term is computed separately since it is going to beuseful by itself for the covariance of the total error with the totalreads that enters Equations 6.

Cov(E_(n)^(b), X_(n)) = (Cov(E_(n)^(b), X_(n)|E_(n − 1)^(b)X_(n − 1)) + +Cov((E_(b)^(n)|E_(n − 1)^(b)X_(n − 1)), (X_(n)|E_(n − 1)^(b)X_(n − 1))) =  = (Cov(E_(n − 1)^(b) + B(E_(n − 1)^(b), p) + B(X_(n − 1) − E_(n − 1)^(b), p_(e)), X_(n − 1) + B(X_(n − 1), p)|E_(n − 1)^(b)X_(n − 1))) + Cov((E_(n)^(b)|E_(n − 1)^(b)X_(n − 1)), (X_(n)|E_(n − 1)^(b)X_(n − 1))) = T₁ + T₂

Here B( . . . ) stands for a random variable distributed according tobinomial distribution with corresponding parameters, as defined inEquation 9. Two terms in the above equation are denoted by T₁ and T₂ andare computed separately below. For the next step in derivation, theexpression

B(X _(n-1) ,p)=B(E _(n-1) ^(b) ,p)+B(X _(n-1) −E _(n-1) ^(b) ,p)

is used, which holds if X_(n-1) and E_(n-1) ^(b) are constants asopposed to random variables. This is satisfied because these expressionsenter conditional statistics. Using this, for the first term:

$\begin{matrix}{T_{1} = {\left( {{{Cov}\; \left( {{B\left( {E_{n - 1}^{b},p} \right)},\left. {B\left( {X_{n - 1},p} \right)} \middle| {E_{n - 1}^{b}X_{n - 1}} \right.} \right)} +} \right.}} \\{{{+ {Co}}{v\left( {{B\left( {{X_{n - 1} - E_{n - 1}^{b}},p_{e}} \right)},{\left. {B\left( {x_{n - 1},P} \right)} \middle| {E_{n - 1}^{b}X_{n - 1}} \right. =}} \right.}}} \\{= {\left( {{{Cov}\; \left( {{B\left( {E_{n - 1}^{b},p} \right)},\left. {{B\left( {E_{n - 1}^{b},p} \right)} + {B\left( {{X_{n - 1} - E_{n - 1}^{b}},p} \right)}} \middle| {E_{n - 1}^{b}X_{n - 1}} \right.} \right)} +} \right.}} \\{\left. {{+ {Co}}{v\left( {{B\left( {{X_{n - 1} - E_{n - 1}^{b}},p_{e}} \right)},\ \left. {{B\left( {E_{n - 1}^{b},p} \right)} + {B\left( {{X_{n - 1} - E_{n - 1}^{b}},p} \right)}} \middle| {E_{n - 1}^{b}x_{n - 1}} \right.} \right)}} \right) =} \\{= {\left( {{{Cov}\; \left( {{B\left( {E_{n - 1}^{b},p} \right)},\left. {B\left( {E_{n - 1}^{b},p} \right)} \middle| {E_{n - 1}^{b}X_{n - 1}} \right.} \right)} +} \right.}} \\{{{{+ {Cov}}\left( {{B\left( {E_{n - 1}^{b},p} \right)},\left. {B\left( {{X_{n - 1} - E_{n - 1}^{b}},p} \right)} \middle| {E_{n - 1}^{b}X_{n - 1}} \right.} \right)} +}} \\{{{{+ {Cov}}\left( {{B\left( {{X_{n - 1} - E_{n - 1}^{b}},p_{e}} \right)},\left. {B\left( {E_{n - 1}^{b},p} \right)} \middle| {E_{n - 1}^{b}X_{n - 1}} \right.} \right)} +}} \\\left. {{+ {Co}}{v\left( {{B\left( {{X_{n - 1} - E_{n - 1}^{b}},p_{e}} \right)},\left. {B\left( {{X_{n - 1} - E_{n - 1}^{b}},p} \right)} \middle| {E_{n - 1}^{b}X_{n - 1}} \right.} \right)}} \right)\end{matrix}$

where the two crossed out terms amount to zero due to considerations forthe physical process being modelled. The first crossed out termdescribes replication of error and normal molecules that, whileconditioned on X_(n-1) and E_(n-1) ^(b), is uncorrelated. The secondcrossed out term describes replication of error molecules and creationof new error molecules which are independent. Proceeding with evaluationof T₁:

$\begin{matrix}{T_{1} = {\left( {{\left( {B\left( {E_{n - 1}^{b},p} \right)} \middle| {E_{n - 1}^{b}X_{n - 1}} \right)} +} \right.}} \\\left. {{+ {Co}}{v\left( {{B\left( {{X_{n - 1} - E_{n - 1}^{b}},p_{e}} \right)},\left. {B\left( {{X_{n - 1} - E_{n - 1}^{b}},p} \right)} \middle| {E_{n - 1}^{b}X_{n - 1}} \right.} \right)}} \right) \\{= {{{p\left( {1 - p} \right)}{\left( E_{n - 1}^{b} \right)}} + {{p_{e}\left( {1 - p} \right)}{\left( {X_{n - 1} - E_{n - 1}^{b}} \right)}}}}\end{matrix}$

Here, the first term follows from the definition of variance forbinomial distribution. The second term uses the following property: fortwo random binomial variables, Y and Z distributed as Y˜B(n, p) andZ˜B(Y, q) then

$\begin{matrix}{{{Co}{v\left( {Y,Z} \right)}} = {{{({YZ})} - {{(Y)}{(Z)}}} = {{\left( {\left( {YZ} \middle| Y \right)} \right)} - {{np}\; {\left( {{\left( Z \middle| y \right)} =} \right.}}}}} \\{= {{{\left( {Y\; {\left( Z \middle| Y \right)}} \right)} - {n^{2}p^{2}q}} = {{{\left( {qY}^{2} \right)} - {n^{2}p^{2}q}} =}}} \\{= {{{q\left( {{n{p\left( {1 - p} \right)}} + {n^{2}p^{2}}} \right)} - {n^{2}p^{2}q}} = {{qpn}\left( {1 - p} \right)}}}\end{matrix}$

In the present case, Y represents the number of normal moleculesreplicating at cycle n−1 and Z−number of error molecules generated outof those molecules, and P_(e) represents the probability of error giventhe probability of replication, so it is effectively p_(q) in theexample above.

The second term, T₂ for the covariance expression is pretty straightforward.

T ₂=Cov((1+p−p _(e))E _(n-1) ^(b) +p _(e) X _(n-1),(1+p)X_(n-1))==(1+p)(1+p−p _(e))Cov(E _(n-1) ^(b) ,X _(n-1))+p _(e)(1+p)

(X _(n-1))

Putting together all the terms for covariance expression, a recursiverelation is obtained:

Cov(E _(n) ^(b) −X _(n))=(1+p)(1+p−p _(e))Cov(E _(n-1) ^(b) ,X _(n-1))+p_(e)(1−p)(1+p)^(2n)

(X ₀)

Thus, a solution to the recursive relation in the following form wouldbe useful:

a _(n) =c ₁ a _(n-1) +c ₂ d ^(2(n-1)) +c ₃(n−1)d ^(n−2)

with

a _(n)=Cov(E _(n) ^(b) ,X _(n))

c ₁=(1+p)(1+p−p _(e))

c ₂ =p _(e)(1−p)

(X ₀)+p _(e)(1+p)

(X ₀)

c ₃+(p−p _(e))(1−p)p _(e)

(X ₀)

d=(1+p)

After applying the recursive formula n times, the following patternemerges:

$\begin{matrix}{a_{n} = {{c_{1}^{n}a_{0}} +}} \\{{{+ {c_{2}\left( {c_{1}^{n - 1} + {c_{1}^{n - 2}d^{2}} + {c_{1}^{n - 3}\left( d^{2} \right)}^{2} + \cdots \mspace{20mu} + {c_{1}\left( d^{2} \right)}^{n - 2} + \left( d^{2} \right)^{n - 1}} \right)}} +}} \\{{{{+ c_{3}}\frac{\partial}{\partial d}\left( {c_{1}^{n - 1} + {c_{1}^{n - 2}d} + \cdots \mspace{20mu} + {c_{1}d^{n - 2}} + d^{n - 1}} \right)} =}} \\{= {{c_{2}\frac{c_{1}^{n} - \left( d^{2} \right)^{n}}{c_{1} - d^{2}}} + {c_{3}\frac{\partial}{\partial d}\frac{c_{1}^{n} - d^{n}}{c_{1} - d}}}}\end{matrix}$

where the formula for the sum of geometric progression S_(n)=Σk=0^(n)s^(n−k)t^(k)=s^(n)Σ_(k=0) ^(n)(t/s)^(k)=(s^(n+1)−t^(n+1))/(s−t) wasused. Substituting all the coefficients and simplifying the expressionprovides the answer for covariance between the background error countsand the total number of reads as

$\begin{matrix}\begin{matrix}{{{Co}{v\left( {E_{n}^{b},X_{n}} \right)}} = {{{n\left( {1 + p} \right)}^{{2n} - 2}{p_{e}\left( {1 - p} \right)}{\left( X_{0} \right)}} +}} \\{{{{+ n}\left( {1 + p} \right)^{{2n} - 2}\left( {1 + p} \right)p_{e}{\left( X_{0} \right)}} +}} \\{{{{+ \left( {1 + p} \right)^{{2n} - 2}}\frac{1 - p}{p - p_{e}}{\left( X_{0} \right)}} -}} \\{{{- \left( {1 + p} \right)^{n - 1}}p_{e}\frac{1 - p}{p - p_{e}}{\left( X_{0} \right)}}}\end{matrix} & (17)\end{matrix}$

Substituting Equation 17 back into Equation 16 and grouping similarterms, the recursive relation for the variance is

(E _(n) ^(b))=c ₁

(e _(n-1) ^(b))+c ₂(1+p)^(n−1) +c ₃(n−1)(1+p)^(n−2) ++c ₄(1+p)^(2(n-1))+c ₅(n−1)(1+p)^(2n-4)

with coefficients in this expression defined as

$\begin{matrix}{{c_{1} = {\left( {1 + p} \right)^{2} - {2\left( {1 + p} \right)p_{e}} + p_{e}^{2}}}{c_{2} = {\left( {p_{e} - p_{e}^{2} - \frac{p_{e}^{2}\left( {1 - {p\left( {p + 2} \right)}} \right.}{p\left( {1 + p} \right)}} \right){\left( X_{0} \right)}}}{c_{3} = {\left( {{p_{e}{p\left( {1 - p} \right)}} - p_{e}^{2}} \right){\left( X_{0} \right)}}}{C_{4} = {{p_{e}^{2}{\left( X_{0} \right)}} + {p_{e}^{2}\frac{\left( {1 - p} \right)\left( {p + 2} \right)}{p\left( {1 + p} \right)}{\left( X_{0} \right)}}}}{c_{5} = {2{p_{e}^{2}\left( {{\left( {1 - p^{2}} \right){\left( X_{0} \right)}} + {\left( {1 + p} \right)^{2}{\left( X_{0} \right)}}} \right)}}}} & (18)\end{matrix}$

where only terms up to p_(e) ² are kept. Going through a similar processas for Coy to solve this recursive relation, the solution for thevariance of background error

$\begin{matrix}{{\left( E_{n}^{b} \middle| {pp_{e}} \right)} = {{c_{2}\frac{c_{1}^{n} - x^{n}}{c_{1} - x}} + {c_{3}\frac{c_{1}^{n} - x^{n} - {n{x^{n - 1}\left( {c_{1} - x} \right)}}}{\left( {c_{1} - x} \right)^{2}}}}} & (19)\end{matrix}$

is obtained, where the coefficients defined above and notations

x=1+p

y=(1+p)²

are used.

Overview of Some Implementations

The derivations in the previous sections produce quantities conditionedon replication efficiency per cycle p and error rate per cycle p_(e). Inorder to evaluate absolute quantity Q, the following equations can beused

(Q)=E(E(Q|p))=∫₀ ¹

((Q|p)f(p)dp

(Q)=

(V(Q|p))+

(E(Q|p))

where f(p) stands for distribution of p that is to be estimated from thedata. To remove conditioning on p_(e) the mean and variance of errorrate is estimated and used to evaluate expressions as p_(e)=mean(pe) andp_(e) ²=var(p_(e))+mean(p_(e))². It is also useful to compute

(X₀) and

(X₀) from data. Sequencing data including reads at targeted positions ina genome can be used. The present description distinguishes between areference read R_(r), counts for the base specified in the referencegenome, and error reads R_(e), counts for the bases different fromreference. The total reads, then, are defined as R=R^(r)+Σ_(nonref)R^(e)With these definitions, the following can be implemented.4. Estimation of Efficiency and Error from the Training Data

Using a set of normal samples that are not expected to have any cancerrelated mutation, the efficiency can be estimated from relationR=(1+p)^(n)X₀ at each position. Assuming that starting copy or count X₀is the same for each position, and assigning some arbitrary (relativelyhigh) efficiency p* to positions with number of reads R* in highpercentile (e.g. 99^(th) percentile),

$\begin{matrix}{\frac{1 + p}{1 + {p*}} = {\left. \frac{\left( {R/X_{0}} \right)^{1/n}}{\left( {R*{/X_{0}}} \right)^{1/n}}\Rightarrow p \right. = {{\left( \frac{R}{R*} \right)^{\frac{1}{n}}\left( {1 + p^{*}} \right)} - 1}}} & (20)\end{matrix}$

Using this estimate for efficiency, the error rate per cycle at eachposition can be estimated from Equation 13 as

$\begin{matrix}{p_{e} = {\frac{R^{e}}{{n\left( {1 + p} \right)}^{n - 1}X_{0}} = \frac{R^{e}\left( {1 + p} \right)}{nR}}} & (21)\end{matrix}$

The mean and standard deviation of these quantities are found for eachposition by computing the statistics over multiple normal samplessupplied in the data set. These values are later combined over basessharing the same motifs, as described in more detail herein, and can besaved to be used for calling mutations in different samples.

5. Estimation of Starting Copy for a Test Sample

Using the mean and standard deviation of efficiency for each positionfound previously from normal samples, the starting copy at each positionfor a test sample can be estimated as

$\begin{matrix}{X_{0} = {\int_{0}^{1}{\frac{R}{\left( {1 + p} \right)^{n}}{f(p)}dp}}} & (22)\end{matrix}$

where f(p)=B(α,β) is the beta distribution with parameters α and β foundfrom mean and standard deviation of efficiency. The mean and standarddeviation of X₀ over positions belonging to the same sequenced geneticfragment can be computed and assigned to each position in the fragment.

6. Adjusting Efficiency for a Test Sample

In some implementations, an update or correction of the efficiencyvalues can be performed based on the found staring copy according to

$\begin{matrix}{p = {\int{\left( {\left( \frac{R}{ϰ_{0}} \right)^{1/n} - 1} \right){g\left( x_{0} \right)}dx_{0}}}} & (23)\end{matrix}$

where g(x0)=N (α,β) is normal distribution with mean and standarddeviation found for starting copy at particular position.

Training Algorithms

In order to determine the mutation fraction distribution, appropriatetraining can be used to estimate the distribution parameters.

7. Base Specific Training

For base specific training, the model parameters for each base can beestimated separately in the target panel. A basic assumption of thistraining process is that each base in the panel has a certainamplification rate and error rate. For this training method to work,control samples from normal subjects can be used. For example, 20-30normal samples to estimate model parameters using base specific trainingcan be used. The below algorithm outlines a basic flowchart of a basespecific error model.

Algorithm 1 Base specific training algorithm Training: D_(i,k) =(R_(i,k), RefAllele_(i), A_(i,k), C_(i,k), G_(i,k), T_(i,k)) where i ∈{1, 2, . . . , B} denotes a base and k ∈ {1, 2, . . . , n} denotes asample, RefAllele_(i) is the reference / wildtype allele for base i,R_(i,k) is the total depth of reads, A_(i,k), C_(i,k), G_(i,k), T_(i,k)are the number of reads from alleles A, C, G, T respectively. Test:D_(i,k) ^(Test) = (R_(i) ^(Test), RefAllele_(i), A_(i) ^(Test), C_(i)^(Test), G_(i) ^(Test), T_(i) ^(Test)) for i = 1, 2, . . . , B. Mutationcall confidence scores for non-reference alleles in the test set for allbases 1, 2, . . . , B. for i = 1, 2, . . . , B do 1. Estimate efficiencyand error from training data as explained above for base i, using thedata D_(i,k). 2. Estimate starting copy for base i for test data at basei, using methods described above; 3. Adjust efficiency parameter at basei using methods described above. 4. For a grid of values of θ ∈ [0,τ_(max)] (where τ_(max) is ideally 1 but for practical purpose, itsuffices to set τ_(max)≈0.15) of candidate mutation fractions, plug inthe estimated efficiency and error parameters in equation (6) and (7) tocompute the likelihood L(θ) of test data using the beta- binomial modelin (1). 5. Find Maximum Likelihood Estimate of θ, {circumflex over(θ)}_(MLE) := argmax_(θ)L(θ) 6.${{Compute}\mspace{14mu} {confidence}\mspace{14mu} {score}\mspace{14mu} {as}\mspace{14mu} C} = \frac{L\left( {\hat{\theta}}_{MLE} \right)}{{L\left( {\hat{\theta}}_{MLE} \right)} + {L(0)}}$

8. Motif-Specific Training

Motif-specific training are useful in part because the sequence contextaround the base of interest contributes to the PCR error rate. Thus anerror model can be generated from training data for each 3-base motifsuch that a base of interest is always the middle base. Other motifs canbe used alternatively or additionally. For example, a motif may includeone or more adjacent bases on only one side of the target base, or mayinclude a symmetric (equal) or an asymmetric (not equal) number of baseson the two sides of the target base. Any number of adjacent bases may bedefined as a motif. The motif specific error model estimates the middlebase error parameters for each motif keeping the flanking bases same(e.g. estimates the error parameters for ATA→ACA, GTC→GAC, etc.). Forexample, in some implementations the algorithm estimates the error for

AAAATC → AAAACC GATCA → GACCA GTGGC → GCGGC . . .Dynamic flanking bases may also be implemented, and motifs may bevariable based on the sequence context. In some embodiments, the motifcomprises 1, 2, 3, 4, or 5 adjacent bases before the target base. Insome embodiments, the motif comprises 1, 2, 3, 4, or 5 adjacent basesafter the target base.

Estimating Parameters for Motifs

Some implementations include performing the following steps:

1. From the training set, remove (bases, channel) data pairs for errorrates more than or equal to α, where α=min{a predetermined number (e.g.0.2), a predetermined percentile of the error rates in the trainingsample (e.g. the 99^(th) percentile)}.2. Compute per cycle error rate per base per channel.3. Compute mean and variance per motif using a grouped or pooled meanand variance formula. For example if μ₁, μ₂, . . . , μ_(n) are the meansand σ₁ ², σ₂ ², . . . , σ_(n) ² are the variances error rates of basesthat share the same motif, then the pooled mean and variance may becalculated as

${\mu_{pooled} = {\frac{1}{n}{\sum\limits_{i = 1}^{n}\mu_{i}}}}{\sigma_{pooled}^{2} = {\frac{1}{n}{\sum\limits_{i = 1}^{n}\sigma_{i}^{2}}}}$

4. If there are multiple training runs, then the pooling can be donestepwise, first pooling samples in individual runs and then pooling allruns. While pooling runs, the error rates can be weighted by number ofoccurrences of the motif in the run. In other implementations, the errorrates are averaged without weighting.5. Since the efficiency is not necessarily a function of motif, theefficiency parameter for each motif need not be averaged separately.Instead the mean and variances of the efficiency parameter is averagedover all samples to come up with one prior estimate for efficiencyparameters. This prior estimate is no-longer position dependent. Inother implementations, the efficiency parameter may be determined on amotif-specific basis, similarly to the determination of themotif-specific error rates.

Some implementations include fitting a regression model of the estimatedefficiency values using the amplicon GC content, temperature, and soforth, as covariates and using this model to estimate the priorparameters instead of using a constant prior.

Algorithm 2 Motif specific training algorithm Training Data: D_(i,k) =(R_(i,k), RefAllele_(i), A_(i,k), C_(i,k), G_(i,k), T_(i,k)) where i ∈{1, 2, . . . , B_(Training)} denotes a base and k ∈ {1, 2, . . . , n}denotes a sample, RefAllele_(i) is the reference / wildtype allele forbase I, R_(i,k) is the total depth of reads, A_(i,k), C_(i,ki), G_(i,k),T_(i,k) are the number of reads from alleles A, C, G, T respectively.M_(i,k) denotes the motif for the i-th base in sample k where M_(i,k) ∈ 

 := {X₁X₂X₃} such that X_(j) ∈ {A, C, G, T}∀j Test Data: D_(i,k) ^(Text)= (R_(i) ^(Test), RefAllele_(i), A_(i) ^(Test), C_(i) ^(Test), G_(i)^(Test), T_(i) ^(Test), M_(i) ^(Test)) for i = 1, 2, . . . ,B_(TestData). Result: Mutation call confidence scores for non-referencealleles in the test set for all bases 1, 2, . . . , B. for Training do >Training Block 1: 1. Let α = min{a predetermined threshold, apredetermined percentile of observed hetrates in the training data. 2.∀i = 1, 2, . . . , B_(Training); ∀k = 1, 2, . . . , n, compute per cycleefficiency p_(i,k) and error rate pe, i, k using the data D_(i,k). Ifhetrate is ≥ α for some (base, channel) combination, then skip errorestimation for that combination. 3. Group the bases by motifs such thatbases sharing the same motif are assigned to same group, forming Mgroups. 4. ∀m ∈ 

, compute mean and variance of error rates for m using the grouped data.5. Pool all bases together to compute the mean and variance of theefficiency parameter. for i = 1, 2, . . . , B_(Test) do > Test Block2: 1. If the motif for base i is m_(i), use universal efficiencyparameters from last step and error parameters for motif m_(i) forsubsequent steps. 2. Estimate starting copy for base i for test data atbase i. 3. Adjust efficiency parameter at base i. 4. For a grid ofvalues of θ ∈ [0, τ_(max)] (where τ_(max) is ideally 1 but for practicalpurpose, it suffices to set τ_(max)≈0.15) for candidate mutationfractions, plug in the estimated efficiency and error parameters inequation (6) and (7) to compute the likelihood L(θ) of test data usingthe beta- binomial model in (1). 5. Find Maximum Likelihood Estimate ofθ, θ, {circumflex over (θ)}_(MLE) := argmax_(θ)L(θ). 6.${{Compute}\mspace{14mu} {confidence}\mspace{14mu} {score}\mspace{14mu} {as}\mspace{14mu} C} = \frac{L\left( {\hat{\theta}}_{MLE} \right)}{{L\left( {\hat{\theta}}_{MLE} \right)} + {L(0)}}$

Referring now to FIG. 10, FIG. 10 is a block diagram showing anembodiment of an error analysis system 300. The error analysis system300 can include one or more processors 301, and a memory 302. The one ormore processors 301 may include one or more microprocessors,application-specific integrated circuits (ASIC), a field-programmablegate arrays (FPGA), etc., or combinations thereof. The memory 302 mayinclude, but is not limited to, electronic, magnetic, or any otherstorage or transmission device capable of providing processor withprogram instructions. The memory may include magnetic disk, memory chip,read-only memory (ROM), random-access memory (RAM), ElectricallyErasable Programmable Read-Only Memory (EEPROM), erasable programmableread only memory (EPROM), flash memory, or any other suitable memoryfrom which processor can read instructions. The memory 302 may includecomponents, subsystems, modules, scripts, applications, or one or moresets of processor-executable instructions for implementing erroranalysis processes, including any processes described herein. Forexample, the memory 302 may include training data 304, a replicationefficiency analyzer 306, a replication error analyzer 312, a statisticsengine 314, an initial count estimator 318, a distribution determiner320, and a mutation caller 322.

The training data 304 can include, for example, data of the followingtype: (R_(i,k), RefAllele_(i), A_(i,k), C_(i,k), G_(i,k), T_(i,k)) wherei∈{1,2, . . . , B_(Training)} denotes a base and k∈{1, 2, . . . , n}denotes a sample, RefAllele_(i) is the reference/wildtype allele forbase I, R_(i,k) is the total depth of reads, A_(i,k), C_(i,ki), G_(i,k),T_(i,k) are the number of reads from alleles A, C, G, T respectively.M_(i,k) denotes the motif for the i-th base in sample k where M_(i,k)∈

:={X₁X₂X₃} such that X_(j)∈{A, C, G, T}∀j. The training data may bederived from one or more one or more samples taken from one or moresubjects. The training data may include only genetic material that doesnot include mutations of interest (e.g. mutations for which a mutationfraction is being determined).

The replication efficiency analyzer 306 may include components,subsystems, modules, scripts, applications, or one or more sets ofprocessor-executable instructions for determining a replicationefficiency of a PCR process, using the training data. The replicationefficiency analyzer 306 may include an initial efficiency estimator 308that determines an initial estimate of the replication efficiency. Forexample, the replication efficiency analyzer 306 may estimate thereplication efficiency from the relation R=(1+p)^(n)X₀ at each position.The replication efficiency analyzer 306 may determine the initialreplication efficiency estimate using Equation 20. The replicationefficiency analyzer 306 may include an efficiency updater 310. Theefficiency updater 310 may update or correct an initial efficiencyestimate using an initial count determined by the initial countestimator 318 (described in more detail below). The efficiency updater310 may update or correct the initial efficiency estimate using Equation23.

The replication error analyzer 312 may include components, subsystems,modules, scripts, applications, or one or more sets ofprocessor-executable instructions for determining a replication errorrate. For example, the replication error analyzer 312 can determine anerror rate per cycle at each position using equation 21. The determinederror rate may correspond to background error, including error inducedby the PCR process. The replication error analyzer 312 can determine theerror rate per cycle at each position using the training data (e.g.based on the number of erroneous reads and the total number of readsmade).

The statistics engine 314 may include components, subsystems, modules,scripts, applications, or one or more sets of processor-executableinstructions for determining statistical values for the replicationefficiencies determined by the replication efficiency analyzer 306, andfor the replication error rates determined by the replication erroranalyzer 312. For example, the statistics engine 314 may determine amean or estimated replication efficiency based on the replicationefficiencies determined by the replication efficiency analyzer 306, andmay determine a variance thereof. For example, the statistics engine 314may determine the mean over all samples analyzed samples in aposition-independent manner.

The statistics engine 314 may determine a mean or estimated replicationerror rate, and variance thereof, based on the replication error ratesdetermined by the replication error analyzer 312. The mean or estimatedreplication error rate may be motif-specific. For example, thestatistics engine 314 may include a motif aggregator 316 that groups thetarget bases to be analyzed by motif (that is, into groups in which alltarget bases of the group have a same motif). In some implementations,the motif aggregator 316 references a data structure that specifiesmotif parameters (e.g. a first number of adjacent bases sequentiallyprior to the target base, and a second number of adjacent basessequentially following the target base) that define the motifs. Forexample, if a plurality of mean replication error rates μ₁, μ₂, . . . ,μ_(n) and a plurality of variances thereof σ₁ ², σ₂ ², . . . , σ_(n) ²are determined by the statistics engine 314 based on data determined bythe replication error analyzer 312, the motif-specific grouped mean andvariance may be calculated as

${\mu_{pooled} = {\frac{1}{n}{\sum\limits_{i = 1}^{n}\mu_{i}}}}{\sigma_{pooled}^{2} = {\frac{1}{n}{\sum\limits_{i = 1}^{n}\sigma_{i}^{2}}}}$

The grouping can be done stepwise, first grouping samples in individualruns and then grouping all runs. While grouping runs, the error ratescan be weighted by number of occurrences of the motif in the run. Inother implementations, the error rates are averaged without weighting.

The statistics engine 314 may implement a filtering policy to sanitizethe data. For example, the statistics engine 314 may remove from thetraining set (bases, channel) data pairs for error rates more than orequal to α, where α=min{a predetermined number (e.g. 0.2), apredetermined percentile of the error rates in the training sample (e.g.the 99^(th) percentile)}.

The initial count estimator 318 may include components, subsystems,modules, scripts, applications, or one or more sets ofprocessor-executable instructions for determining an initial count of atarget base for one or more samples. For example, the initial countestimator 318 may use Equation 22 to determine a plurality of initialcount estimates for each base being analyzed. The initial countestimator 318 (or, in some implementations, the statistics engine 314)may determine a plurality of estimates or mean values for the initialcount, and variances thereof, over positions belonging to a samesequenced genetic fragment, and may assign those values to each positionin the genetic fragment. Those values may be used by the initialefficiency updater 310 to update an initial efficiency estimate, asdescribed herein.

The distribution determiner 320 may include components, subsystems,modules, scripts, applications, or one or more sets ofprocessor-executable instructions for determining parameters for adistribution representing a mutation fraction of one or more analyzedsamples. For example, the distribution determiner 320 may determineparameters for a Beta Binomial distribution of the mutation fraction.The distribution determiner 320 may, for a grid of values ofθ∈[0,τ_(max)] (where τ_(max) is ideally 1 but for practical purpose, itsuffices to set τ_(max)≈0.15) for candidate mutation fractions, plug inthe estimated efficiency and error parameters in to equation (6) and (7)to compute the likelihood L(θ) of test data using the beta-binomialmodel in (1). The distribution determiner 320 may select a highestlikelihood mutation fraction as the determined mutation fraction for theone or more analyzed samples.

The mutation caller 322 may include components, subsystems, modules,scripts, applications, or one or more sets of processor-executableinstructions for determining parameters for calling mutations. Themutation caller 322 may call mutations based on one or more parametervalues being equal to, or above, a predetermined threshold. For example,the parameter values can include a mutation fraction, an absolute numberof detected errors or mutations, or a number of standard deviations bywhich those parameter values deviate from a reference or mean value. Themutation caller 322 may also determine a confidence corresponding to thecalled mutation (e.g. based at least in part on a difference between theparameter value and the threshold).

Referring now to FIG. 11, a method for calling a mutation using amotif-specific error model is shown. The method includes BLOCK 402through BLOCK 410. In a brief overview, at BLOCK 402, the error analysissystem 300 determines, for each target base of a plurality of targetbases, a respective value for a background error parameter based ontraining data. At BLOCK 404, the error analysis system 300 identifies arespective motif for each target base. At BLOCK 406, the error analysissystem 300 groups the target bases into groups, each group correspondingto a particular motif. At BLOCK 408, the error analysis system 300determines, for each group, a respective motif-specific parameter valuefor the background error. At BLOCK 410, the error analysis system 300calls a mutation using the motif-specific error model and sequencinginformation.

In more detail, at BLOCK 402, the error analysis system 300 determines,for each target base of a plurality of target bases, a respective valuefor a background error parameter based on training data. For example,the replication error analyzer 312 can determine an error rate per cyclefor each target base of a plurality of target bases using equation 21.The determined error rate may correspond to background error, includingerror induced by the PCR process. The replication error analyzer 312 candetermine the error rate per cycle at each position using the trainingdata (e.g. based on the number of erroneous reads and the total numberof reads made).

At BLOCK 404, the error analysis system 300 identifies a respectivemotif for each target base, and at BLOCK 406, the error analysis system300 groups the target bases into groups, each group corresponding to aparticular motif. For example, the motif aggregator 316 references adata structure that specifies motif parameters (e.g. a first number ofadjacent bases sequentially prior to the target base, and a secondnumber of adjacent bases sequentially following the target base) thatdefine the motifs. For example, if a plurality of mean replication errorrates μ₁, μ₂, . . . , μ_(n) and a plurality of variances thereof σ₁ ²,σ₂ ², . . . , σ_(n) ² are determined by the statistics engine 314 basedon data determined by the replication error analyzer 312, themotif-specific grouped mean and variance may be calculated as

${\mu_{pooled} = {\frac{1}{n}{\sum\limits_{i = 1}^{n}\mu_{i}}}}{\sigma_{pooled}^{2} = {\frac{1}{n}{\sum\limits_{i = 1}^{n}\sigma_{i}^{2}}}}$

The grouping can be done stepwise, first grouping samples in individualruns and then grouping all runs. While grouping runs, the error ratescan be weighted by number of occurrences of the motif in the run. Inother implementations, the error rates are averaged without weighting.

At BLOCK 408, the error analysis system 300 determines, for each group,a respective motif-specific parameter value for the background error.For example, the statistics engine 314 may determine a mean or estimatedreplication error rate, and variance thereof, for each group determinedby the motif aggregator 316. Thus, the determined mean or estimatedreplication error rate may be motif-specific.

At BLOCK 410, the error analysis system 300 calls a mutation using themotif-specific error model and sequencing information. For example, thedistribution determiner 320 may determine parameters for a Beta Binomialdistribution of the mutation fraction. The distribution determiner 320may, for a grid of values of 0∈[0, τ_(max)] (where τ_(max) is ideally 1but for practical purpose, it suffices to set τ_(max)≈0.15) forcandidate mutation fractions, plug in the estimated efficiency and errorparameters in to equation (6) and (7) to compute the likelihood L(θ) oftest data using the beta-binomial model in (1). The distributiondeterminer 320 may select a highest likelihood mutation fraction as thedetermined mutation fraction for the one or more analyzed samples. Themutation caller 322 may call mutations based on one or more parametervalues being equal to, or above, a predetermined threshold. For example,the parameter values can include the mutation fraction determined by thedistribution determiner 320. The mutation caller 322 may also determinea confidence corresponding to the called mutation (e.g. based at leastin part on a difference between the parameter value and the threshold).Thus, a mutation can be accurately called using a motif-specificapproach.

Referring now to FIG. 12, a method for determining a distribution for amutation fraction is shown. The method includes BLOCK 502 through BLOCK512. In a brief overview, at BLOCK 502, the error analysis system 300determines, for each target base of a plurality of target bases, arespective replication efficiency based on training data, and acorresponding mean and variance. At BLOCK 504, the error analysis system300 determines for each target base of the plurality of target bases, arespective replication error rate, and a corresponding mean andvariance. At BLOCK 506, the error analysis system 300 determines aplurality of motif-specific replication error rates, and correspondingmeans and variances. At BLOCK 508, the error analysis system 300determines an initial count for each of the target bases based on themean and variance of the corresponding replication efficiency. At BLOCK510, the error analysis system 300 determines an expectation and avariance of a total count for each of the target bases and anexpectation and a variance of an error count. At BLOCK 512, the erroranalysis system 300 determines a distribution for the mutation fractionbased on the expectation and the variance of the total count for each ofthe target bases and the expectation and the variance of the errorcount.

In more detail, at BLOCK 502, the replication efficiency analyzer 306may determine an initial estimate of the replication efficiency. Forexample, the replication efficiency analyzer 306 may estimate thereplication efficiency from the relation R=(1+p)^(n)X₀ at each position.The replication efficiency analyzer 306 may determine the initialreplication efficiency estimate using Equation 20. The statistics engine314 can determine corresponding mean values and variances.

At BLOCK 504, the replication error analyzer 312 may determine an errorrate per cycle at each position using equation 21. The determined errorrate may correspond to background error, including error induced by thePCR process. The replication error analyzer 312 can determine the errorrate per cycle at each position using the training data (e.g. based onthe number of erroneous reads and the total number of reads made). Thestatistics engine 314 can determine corresponding mean values andvariances.

At BLOCK 506, the motif aggregator 316 may group the target bases to beanalyzed by motif (that is, into groups in which all target bases of thegroup have a same motif). In some implementations, the motif aggregator316 references a data structure that specifies motif parameters (e.g. afirst number of adjacent bases sequentially prior to the target base,and a second number of adjacent bases sequentially following the targetbase) that define the motifs. The grouping can be done stepwise, firstgrouping samples in individual runs and then grouping all runs. Whilegrouping runs, the error rates can be weighted by number of occurrencesof the motif in the run. In other implementations, the error rates areaveraged without weighting. The statistics engine 314 may determinemotif-specific mean or estimated replication error rates, and variancesthereof, based on the determined groups.

At BLOCK 508, the initial count estimator 318 may use Equation 22 todetermine a plurality of initial count estimates for each base beinganalyzed. The initial count estimator 318 (or, in some implementations,the statistics engine 314) may determine a plurality of estimates ormean values for the initial count, and variances thereof, over positionsbelonging to a same sequenced genetic fragment, and may assign thosevalues to each position in the genetic fragment. Those values may beused by the initial efficiency updater 310 to update an initialefficiency estimate, as described herein.

At BLOCK 510, the error analysis system 300 determines an expectationand a variance of a total count for each of the target bases and anexpectation and a variance of an error count, and at BLOCK 512, theerror analysis system 300 determines a distribution for the mutationfraction based on the expectation and the variance of the total countfor each of the target bases and the expectation and the variance of theerror count. This can include, for a grid of values of 0∈[0, τ_(max)](where τ_(max) is ideally 1 but for practical purpose, it suffices toset τ_(max)≈0.15) for candidate mutation fractions, plugging in theestimated efficiency and error parameters in equation (6) and (7) tocompute the likelihood L(θ) of test data using the beta-binomial modelin (1). The process can further include finding a Maximum LikelihoodEstimate of θ, θ, {circumflex over (θ)}_(MLE):=argmax_(θ)L(θ), andcomputing confidence score as

$C = {\frac{L\left( {\overset{\hat{}}{\theta}}_{MLE} \right)}{{L\left( {\overset{\hat{}}{\theta}}_{MLE} \right)} + {L(0)}}.}$

The distribution determiner 320 may select a highest likelihood mutationfraction, and may select the corresponding mutation fractiondistribution as a mutation fraction distribution corresponding to ananalyzed sample. Thus, a mutation fraction and a distribution thereofmay be determined using a motif-specific approach

The above-described embodiments can be implemented in any of numerousways. For example, the embodiments may be implemented using hardware,software, or a combination thereof. When implemented in software, thesoftware code can be executed on any suitable processor or collection ofprocessors, whether provided in a single computer or distributed amongmultiple computers. For example, the error analysis system 300 can beexecuted on a computer or specialty logic system that includes one ormore processors.

Also, a computer may have one or more input and output devices. Thesedevices can be used, among other things, to present a user interface.Examples of output devices that can be used to provide a user interfaceinclude printers or display screens for visual presentation of outputand speakers or other sound generating devices for audible presentationof output. Examples of input devices that can be used for a userinterface include keyboards and pointing devices, such as mice, touchpads, and digitizing tablets. As another example, a computer may receiveinput information through speech recognition or in other audible format.

Such computers may be interconnected by one or more networks in anysuitable form, including a local area network or a wide area network,such as an enterprise network, an intelligent network (IN), or theInternet. Such networks may be based on any suitable technology and mayoperate according to any suitable protocol and may include wirelessnetworks, wired networks, or fiber optic networks.

A computer employed to implement at least a portion of the functionalitydescribed herein may comprise a memory, one or more processing units(also referred to herein simply as “processors”), one or morecommunication interfaces, one or more display units, and one or moreuser input devices. The memory may comprise any computer-readable media,and may store computer instructions (also referred to herein as“processor-executable instructions”) for implementing the variousfunctionalities described herein. The processing unit(s) may be used toexecute the instructions. The communication interface(s) may be coupledto a wired or wireless network, bus, or other communication means andmay therefore allow the computer to transmit communications to and/orreceive communications from other devices. The display unit(s) may beprovided, for example, to allow a user to view various information inconnection with execution of the instructions. The user input device(s)may be provided, for example, to allow the user to make manualadjustments, make selections, enter data or various other information,and/or interact in any of a variety of manners with the processor duringexecution of the instructions.

The various methods or processes outlined herein may be coded assoftware that is executable on one or more processors that employ anyone of a variety of operating systems or platforms. Additionally, suchsoftware may be written using any of a number of suitable programminglanguages and/or programming or scripting tools, and also may becompiled as executable machine language code or intermediate code thatis executed on a framework or virtual machine.

In this respect, various inventive concepts may be embodied as acomputer-readable storage medium (or multiple computer-readable storagemedia) (e.g., a computer memory, one or more floppy discs, compactdiscs, optical discs, magnetic tapes, flash memories, circuitconfigurations in Field Programmable Gate Arrays or other semiconductordevices, or other non-transitory medium or tangible computer storagemedium) encoded with one or more programs that, when executed on one ormore computers or other processors, perform methods that implement thevarious embodiments of the present disclosure discussed above. Thecomputer-readable medium or media can be transportable, such that theprogram or programs stored thereon can be loaded onto one or moredifferent computers or other processors to implement various aspects ofthe present disclosure as discussed above.

The terms “application” or “script” are used herein in a generic senseto refer to any type of computer code or set of processor-executableinstructions that can be employed to program a computer or otherprocessor to implement various aspects of embodiments as discussedabove. Additionally, it should be appreciated that according to oneaspect, one or more computer programs that when executed perform methodsof the present disclosure need not reside on a single computer orprocessor, but may be distributed in a modular fashion amongst a numberof different computers or processors to implement various aspects of thepresent disclosure.

Processor-executable instructions may be in many forms, such as programmodules, executed by one or more computers or other devices. Generally,program modules include routines, programs, objects, components, datastructures, etc. that perform particular tasks or implement particularabstract data types. Typically, the functionality of the program modulesmay be combined or distributed as desired in various embodiments.

Also, data structures may be stored in computer-readable media in anysuitable form. For simplicity of illustration, data structures may beshown to have fields that are related through location in the datastructure. Such relationships may likewise be achieved by assigningstorage for the fields with locations in a computer-readable medium thatconveys relationship between the fields. However, any suitable mechanismmay be used to establish a relationship between information in fields ofa data structure, including through the use of pointers, tags, or othermechanisms that establish relationship between data elements.

Also, various inventive concepts may be embodied as one or more methods,of which an example has been provided. The acts performed as part of themethod may be ordered in any suitable way. Accordingly, embodiments maybe constructed in which acts are performed in an order different thanillustrated, which may include performing some acts simultaneously, eventhough shown as sequential acts in illustrative embodiments.

While operations are depicted in the drawings in a particular order,such operations are not required to be performed in the particular ordershown or in sequential order, and all illustrated operations are notrequired to be performed. Actions described herein can be performed in adifferent order.

The separation of various system components does not require separationin all implementations, and the described program components can beincluded in a single hardware or software product.

Method for Detection Cancer-Associated Mutations

In further aspect, the present disclosure provides a method fordetecting a mutation associated with cancer, comprising: isolatingcell-free DNA from a biological sample of a subject; amplifying from theisolated cell-free DNA a plurality of single-nucleotide variant (SNV)loci that comprise a plurality of target bases, wherein the SNV loci areknown to be associated with cancer; sequencing the amplificationproducts to obtain sequence reads of a plurality of motifs, wherein eachmotif comprises one of the plurality of target bases; and determining amutation fraction distribution for each of the plurality of target basesand identifying a mutation associated with cancer based on the mutationfraction distribution. In some embodiments, the biological sample isselected from blood, serum, plasma, and urine. In some embodiments, atleast 10, or at least 20, or at least 50, or at least 100, or at least200, or at least 500, or at least 1,000 SNV loci known to be associatedwith cancer are amplified from the isolated cell-free DNA. In someembodiments, the amplification products are sequenced with a depth ofread of at least 200, or at least 500, or at least 1,000, or at least2,000, or at least 5,000, or at least 10,000, or at least 20,000, or atleast 50,000, or at least 100,000. In some embodiments, the plurality ofsingle nucleotide variance loci are selected from SNV loci identified inthe TCGA and COSMIC data sets for cancer.

In an additional aspect, the present disclosure provides a method fordetecting a mutation associated with early relapse or metastasis ofcancer, comprising: isolating cell-free DNA from a biological sample ofa subject who has received treatment for a cancer; performing amultiplex amplification reaction to amplify from the isolated cell-freeDNA a plurality of single-nucleotide variant (SNV) loci that comprise aplurality of target bases, wherein the SNV loci are patient-specific SNVloci associated with the cancer for which the subject has receivedtreatment; sequencing the amplification products to obtain sequencereads of a plurality of motifs, wherein each motif comprises one of theplurality of target bases; and determining a mutation fractiondistribution for each of the plurality of target bases and identifying amutation associated with early relapse or metastasis of cancer based onthe mutation fraction distribution. In some embodiments, the biologicalsample is selected from blood, serum, plasma, and urine. In someembodiments, the multiplex amplification reaction amplifies at least 4,or at least 8, or at least 16, or at least 32, or at least 64, or atleast 128 patient-specific SNV loci associated with the cancer for whichthe subject has received treatment. In some embodiments, theamplification products are sequenced with a depth of read of at least200, or at least 500, or at least 1,000, or at least 2,000, or at least5,000, or at least 10,000, or at least 20,000, or at least 50,000, or atleast 100,000. In some embodiments, the method comprising collecting andanalyzing a plurality of biological samples from the patientlongitudinally.

The terms “cancer” and “cancerous” refer to or describe thephysiological condition in animals that is typically characterized byunregulated cell growth. A “tumor” comprises one or more cancerouscells. There are several main types of cancer. Carcinoma is a cancerthat begins in the skin or in tissues that line or cover internalorgans. Sarcoma is a cancer that begins in bone, cartilage, fat, muscle,blood vessels, or other connective or supportive tissue. Leukemia is acancer that starts in blood-forming tissue, such as the bone marrow, andcauses large numbers of abnormal blood cells to be produced and enterthe blood. Lymphoma and multiple myeloma are cancers that begin in thecells of the immune system. Central nervous system cancers are cancersthat begin in the tissues of the brain and spinal cord.

In some embodiments, the cancer comprises an acute lymphoblasticleukemia; acute myeloid leukemia; adrenocortical carcinoma; AIDS-relatedcancers; AIDS-related lymphoma; anal cancer; appendix cancer;astrocytomas; atypical teratoid/rhabdoid tumor; basal cell carcinoma;bladder cancer; brain stem glioma; brain tumor (including brain stemglioma, central nervous system atypical teratoid/rhabdoid tumor, centralnervous system embryonal tumors, astrocytomas, craniopharyngioma,ependymoblastoma, ependymoma, medulloblastoma, medulloepithelioma,pineal parenchymal tumors of intermediate differentiation,supratentorial primitive neuroectodermal tumors and pineoblastoma);breast cancer; bronchial tumors; Burkitt lymphoma; cancer of unknownprimary site; carcinoid tumor; carcinoma of unknown primary site;central nervous system atypical teratoid/rhabdoid tumor; central nervoussystem embryonal tumors; cervical cancer; childhood cancers; chordoma;chronic lymphocytic leukemia; chronic myelogenous leukemia; chronicmyeloproliferative disorders; colon cancer; colorectal cancer;craniopharyngioma; cutaneous T-cell lymphoma; endocrine pancreas isletcell tumors; endometrial cancer; ependymoblastoma; ependymoma;esophageal cancer; esthesioneuroblastoma; Ewing sarcoma; extracranialgerm cell tumor; extragonadal germ cell tumor; extrahepatic bile ductcancer; gallbladder cancer; gastric (stomach) cancer; gastrointestinalcarcinoid tumor; gastrointestinal stromal cell tumor; gastrointestinalstromal tumor (GIST); gestational trophoblastic tumor; glioma; hairycell leukemia; head and neck cancer; heart cancer; Hodgkin lymphoma;hypopharyngeal cancer; intraocular melanoma; islet cell tumors; Kaposisarcoma; kidney cancer; Langerhans cell histiocytosis; laryngeal cancer;lip cancer; liver cancer; malignant fibrous histiocytoma bone cancer;medulloblastoma; medulloepithelioma; melanoma; Merkel cell carcinoma;Merkel cell skin carcinoma; mesothelioma; metastatic squamous neckcancer with occult primary; mouth cancer; multiple endocrine neoplasiasyndromes; multiple myeloma; multiple myeloma/plasma cell neoplasm;mycosis fungoides; myelodysplastic syndromes; myeloproliferativeneoplasms; nasal cavity cancer; nasopharyngeal cancer; neuroblastoma;Non-Hodgkin lymphoma; nonmelanoma skin cancer; non-small cell lungcancer; oral cancer; oral cavity cancer; oropharyngeal cancer;osteosarcoma; other brain and spinal cord tumors; ovarian cancer;ovarian epithelial cancer; ovarian germ cell tumor; ovarian lowmalignant potential tumor; pancreatic cancer; papillomatosis; paranasalsinus cancer; parathyroid cancer; pelvic cancer; penile cancer;pharyngeal cancer; pineal parenchymal tumors of intermediatedifferentiation; pineoblastoma; pituitary tumor; plasma cellneoplasm/multiple myeloma; pleuropulmonary blastoma; primary centralnervous system (CNS) lymphoma; primary hepatocellular liver cancer;prostate cancer; rectal cancer; renal cancer; renal cell (kidney)cancer; renal cell cancer; respiratory tract cancer; retinoblastoma;rhabdomyosarcoma; salivary gland cancer; Sezary syndrome; small celllung cancer; small intestine cancer; soft tissue sarcoma; squamous cellcarcinoma; squamous neck cancer; stomach (gastric) cancer;supratentorial primitive neuroectodermal tumors; T-cell lymphoma;testicular cancer; throat cancer; thymic carcinoma; thymoma; thyroidcancer; transitional cell cancer; transitional cell cancer of the renalpelvis and ureter; trophoblastic tumor; ureter cancer; urethral cancer;uterine cancer; uterine sarcoma; vaginal cancer; vulvar cancer;Waldenstrom macroglobulinemia; or Wilm's tumor.

In certain examples, the methods includes identifying a confidence valuefor each allele determination at each of the set of single nucleotidevariance loci, which can be based at least in part on a depth of readfor the loci. The confidence limit can be set at least 75%, 80%, 85%,90%, 95%, 96%, 96%, 98%, or 99%. The confidence limit can be set atdifferent levels for different types of mutations

In any of the methods for detecting SNVs herein that include a ctDNA SNVamplification/sequencing workflow, improved amplification parameters formultiplex PCR can be employed. For example, wherein the amplificationreaction is a PCR reaction and the annealing temperature is between 1,2, 3, 4, 5, 6, 7, 8, 9, or 10° C. greater than the melting temperatureon the low end of the range, and 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14 or 15° on the high end the range for at least 10, 20, 25, 30, 40, 50,06, 70, 75, 80, 90, 95 or 100% the primers of the set of primers.

In certain embodiments, wherein the amplification reaction is a PCRreaction the length of the annealing step in the PCR reaction is between10, 15, 20, 30, 45, and 60 minutes on the low end of the range, and 15,20, 30, 45, 60, 120, 180, or 240 minutes on the high end of the range.In certain embodiments, the primer concentration in the amplification,such as the PCR reaction is between 1 and 10 nM. Furthermore, inexemplary embodiments, the primers in the set of primers, are designedto minimize primer dimer formation.

Accordingly, in an example of any of the methods herein that include anamplification step, the amplification reaction is a PCR reaction, theannealing temperature is between 1 and 10° C. greater than the meltingtemperature of at least 90% of the primers of the set of primers, thelength of the annealing step in the PCR reaction is between 15 and 60minutes, the primer concentration in the amplification reaction isbetween 1 and 10 nM, and the primers in the set of primers, are designedto minimize primer dimer formation. In a further aspect of this example,the multiplex amplification reaction is performed under limiting primerconditions.

A sample analyzed in methods of the present invention, in certainillustrative embodiments, is a blood sample, or a fraction thereof.Methods provided herein, in certain embodiments, are specially adaptedfor amplifying DNA fragments, especially tumor DNA fragments that arefound in circulating tumor DNA (ctDNA). Such fragments are typicallyabout 160 nucleotides in length.

It is known in the art that cell-free nucleic acid (e.g. cfDNA), can bereleased into the circulation via various forms of cell death such asapoptosis, necrosis, autophagy and necroptosis. The cfDNA, is fragmentedand the size distribution of the fragments varies from 150-350 bpto >10000 bp. (see Kalnina et al. World J Gastroenterol. 2015 Nov. 7;21(41): 11636-11653). For example the size distributions of plasma DNAfragments in hepatocellular carcinoma (HCC) patients spanned a range of100-220 bp in length with a peak in count frequency at about 166 bp andthe highest tumor DNA concentration in fragments of 150-180 bp in length(see: Jiang et al. Proc Natl Acad Sci USA 112:E1317-E1325).

In an illustrative embodiment the circulating tumor DNA (ctDNA) isisolated from blood using EDTA-2Na tube after removal of cellular debrisand platelets by centrifugation. The plasma samples can be stored at−80° C. until the DNA is extracted using, for example, QIAamp DNA MiniKit (Qiagen, Hilden, Germany), (e.g. Hamakawa et al., Br J Cancer. 2015;112:352-356). Hamakava et al. reported median concentration of extractedcell free DNA of all samples 43.1 ng per ml plasma (range 9.5-1338 ngml/) and a mutant fraction range of 0.001-77.8%, with a median of 0.90%.

Methods of the present invention in certain embodiments, typicallyinclude a step of generating and amplifying a nucleic acid library fromthe sample (i.e. library preparation). The nucleic acids from the sampleduring the library preparation step can have ligation adapters, oftenreferred to as library tags or ligation adaptor tags (LTs), appended,where the ligation adapters contain a universal priming sequence,followed by a universal amplification. In an embodiment, this may bedone using a standard protocol designed to create sequencing librariesafter fragmentation. In an embodiment, the DNA sample can be bluntended, and then an A can be added at the 3′ end. A Y-adaptor with aT-overhang can be added and ligated. In some embodiments, other stickyends can be used other than an A or T overhang. In some embodiments,other adaptors can be added, for example looped ligation adaptors. Insome embodiments, the adaptors may have tag designed for PCRamplification.

A number of the embodiments provided herein, include detecting the SNVsin a ctDNA sample. Such methods in illustrative embodiments, include anamplification step and a sequencing step (Sometimes referred to hereinas a “ctDNA SNV amplification/sequencing workflow). In an illustrativeexample, a ctDNA amplification/sequencing workflow can includegenerating a set of amplicons by performing a multiplex amplificationreaction on nucleic acids isolated from a sample of blood or a fractionthereof from an individual, such as an individual suspected of havingcancer wherein each amplicon of the set of amplicons spans at least onesingle nucleotide variant loci of a set of single nucleotide variantloci, such as an SNV loci known to be associated with cancer; anddetermining the sequence of at least a segment of at each amplicon ofthe set of amplicons, wherein the segment comprises a single nucleotidevariant loci. In this way, this exemplary method determines the singlenucleotide variants present in the sample.

Exemplary ctDNA SNV amplification/sequencing workflows in more detailcan include forming an amplification reaction mixture by combining apolymerase, nucleotide triphosphates, nucleic acid fragments from anucleic acid library generated from the sample, and a set of primersthat each binds an effective distance from a single nucleotide variantloci, or a set of primer pairs that each span an effective region thatincludes a single nucleotide variant loci. The single nucleotide variantloci, in exemplary embodiments, is one known to be associated withcancer. Then, subjecting the amplification reaction mixture toamplification conditions to generate a set of amplicons comprising atleast one single nucleotide variant loci of a set of single nucleotidevariant loci, preferably known to be associated with cancer; anddetermining the sequence of at least a segment of each amplicon of theset of amplicons, wherein the segment comprises a single nucleotidevariant loci.

The effective distance of binding of the primers can be within 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50,75, 100, 125, or 150 base pairs of a SNV loci. The effective range thata pair of primers spans typically includes an SNV and is typically 160base pairs or less, and can be 150, 140, 130, 125, 100, 75, 50 or 25base pairs or less. In other embodiments, the effective range that apair of primers spans is 20, 25, 30, 40, 50, 60, 70, 75, 100, 110, 120,125, 130, 140, or 150 nucleotides from an SNV loci on the low end of therange, and 25, 30, 40, 50, 60, 70, 75, 100, 110, 120, 125, 130, 140, or150, 160, 170, 175, or 200 on the high end of the range.

Primer tails can improve the detection of fragmented DNA fromuniversally tagged libraries. If the library tag and the primer-tailscontain a homologous sequence, hybridization can be improved (forexample, melting temperature (Tm) is lowered) and primers can beextended if only a portion of the primer target sequence is in thesample DNA fragment. In some embodiments, 13 or more target specificbase pairs may be used. In some embodiments, 10 to 12 target specificbase pairs may be used. In some embodiments, 8 to 9 target specific basepairs may be used. In some embodiments, 6 to 7 target specific basepairs may be used.

In one embodiment, Libraries are generated from the samples above byligating adaptors to the ends of DNA fragments in the samples, or to theends of DNA fragments generated from DNA isolated from the samples. Thefragments can then be amplified using PCR, for example, according to thefollowing exemplary protocol: 95° C., 2 min; 15×[95° C., 20 sec, 55° C.,20 sec, 68° C., 20 sec], 68° C. 2 min, 4° C. hold.

Many kits and methods are known in the art for generation of librariesof nucleic acids that include universal primer binding sites forsubsequent amplification, for example clonal amplification, and forsubsequence sequencing. To help facilitate ligation of adapters librarypreparation and amplification can include end repair and adenylation(i.e. A-tailing). Kits especially adapted for preparing libraries fromsmall nucleic acid fragments, especially circulating free DNA, can beuseful for practicing methods provided herein. For example, the NEXTflexCell Free kits available from Bioo Scientific or the Natera Library PrepKit (available from Natera, Inc. San Carlos, Calif.). However, such kitswould typically be modified to include adaptors that are customized forthe amplification and sequencing steps of the methods provided herein.Adaptor ligation can be performed using commercially available kits suchas the ligation kit found in the AGILENT SURESELECT kit (Agilent, CA).

Target regions of the nucleic acid library generated from DNA isolatedfrom the sample, especially a circulating free DNA sample for themethods of the present invention, are then amplified. For thisamplification, a series of primers or primer pairs, which can includebetween 5, 10, 15, 20, 25, 50, 100, 125, 150, 250, 500, 1000, 2500,5000, 10,000, 20,000, 25,000, or 50,000 on the low end of the range and15, 20, 25, 50, 100, 125, 150, 250, 500, 1000, 2500, 5000, 10,000,20,000, 25,000, 50,000, 60,000, 75,000, or 100,000 primers on the upperend of the range, that each bind to one of a series of primer bindingsites.

Primer designs can be generated with Primer3 (Untergrasser A, CutcutacheI, Koressaar T, Ye J, Faircloth B C, Remm M, Rozen S G (2012)“Primer3—new capabilities and interfaces.” Nucleic Acids Research40(15):e115 and Koressaar T, Remm M (2007) “Enhancements andmodifications of primer design program Primer3.” Bioinformatics23(10):1289-91) source code available at primer3.sourceforge.net).Primer specificity can be evaluated by BLAST and added to existingprimer design pipeline criteria:

Primer specificities can be determined using the BLASTn program from thencbi-blast-2.2.29+package. The task option “blastn-short” can be used tomap the primers against hg19 human genome. Primer designs can bedetermined as “specific” if the primer has less than 100 hits to thegenome and the top hit is the target complementary primer binding regionof the genome and is at least two scores higher than other hits (scoreis defined by BLASTn program). This can be done in order to have aunique hit to the genome and to not have many other hits throughout thegenome.

The final selected primers can be visualized in IGV (James T. Robinson,Helga Thorvaldsdóttir, Wendy Winckler, Mitchell Guttman, Eric S. Lander,Gad Getz, Jill P. Mesirov. Integrative Genomics Viewer. NatureBiotechnology 29, 24-26 (2011)) and UCSC browser (Kent W J, Sugnet C W,Furey T S, Roskin K M, Pringle T H, Zahler A M, Haussler D. The humangenome browser at UCSC. Genome Res. 2002 June; 12(6):996-1006) using bedfiles and coverage maps for validation.

Methods described herein, in certain embodiments, include forming anamplification reaction mixture. The reaction mixture typically is formedby combining a polymerase, nucleotide triphosphates, nucleic acidfragments from a nucleic acid library generated from the sample, a setof forward and reverse primers specific for target regions that containSNVs. The reaction mixtures provided herein, themselves forming inillustrative embodiments, a separate aspect of the invention.

An amplification reaction mixture useful for the present inventionincludes components known in the art for nucleic acid amplification,especially for PCR amplification. For example, the reaction mixturetypically includes nucleotide triphosphates, a polymerase, andmagnesium. Polymerases that are useful for the present invention caninclude any polymerase that can be used in an amplification reactionespecially those that are useful in PCR reactions. In certainembodiments, hot start Taq polymerases are especially useful.Amplification reaction mixtures useful for practicing the methodsprovided herein, such as AmpliTaq Gold master mix (Life Technologies,Carlsbad, Calif.), are available commercially.

Amplification (e.g. temperature cycling) conditions for PCR are wellknown in the art. The methods provided herein can include any PCRcycling conditions that result in amplification of target nucleic acidssuch as target nucleic acids from a library. Non-limiting exemplarycycling conditions are provided in the Examples section herein.

There are many workflows that are possible when conducting PCR; someworkflows typical to the methods disclosed herein are provided herein.The steps outlined herein are not meant to exclude other possible stepsnor does it imply that any of the steps described herein are requiredfor the method to work properly. A large number of parameter variationsor other modifications are known in the literature, and may be madewithout affecting the essence of the invention.

In certain embodiments of the method provided herein, at least a portionand in illustrative examples the entire sequence of an amplicon, such asan outer primer target amplicon, is determined. Methods for determiningthe sequence of an amplicon are known in the art. Any of the sequencingmethods known in the art, e.g. Sanger sequencing, can be used for suchsequence determination. In illustrative embodiments high throughputnext-generation sequencing techniques (also referred to herein asmassively parallel sequencing techniques) such as, but not limited to,those employed in MYSEQ (ILLUMINA), HISEQ (ILLUMINA), ION TORRENT (LIFETECHNOLOGIES), GENOME ANALYZER ILX (ILLUMINA), GS FLEX+ (ROCHE 454), canbe used for sequencing the amplicons produced by the methods providedherein.

High throughput genetic sequencers are amenable to the use of barcoding(i.e., sample tagging with distinctive nucleic acid sequences) so as toidentify specific samples from individuals thereby permitting thesimultaneous analysis of multiple samples in a single run of the DNAsequencer. The number of times a given region of the genome in a librarypreparation (or other nucleic preparation of interest) is sequenced(number of reads) will be proportional to the number of copies of thatsequence in the genome of interest (or expression level in the case ofcDNA containing preparations). Biases in amplification efficiency can betaken into account in such quantitative determination.

Target Genes. Target genes of the present invention in exemplaryembodiments, are cancer-related genes, and in many illustrativeembodiments, cancer-related genes. A cancer-related gene refers to agene associated with an altered risk for a cancer or an alteredprognosis for a cancer. Exemplary cancer-related genes that promotecancer include oncogenes; genes that enhance cell proliferation,invasion, or metastasis; genes that inhibit apoptosis; andpro-angiogenesis genes. Cancer-related genes that inhibit cancerinclude, but are not limited to, tumor suppressor genes; genes thatinhibit cell proliferation, invasion, or metastasis; genes that promoteapoptosis; and anti-angiogenesis genes.

An embodiment of the mutation detection method begins with the selectionof the region of the gene that becomes the target. The region with knownmutations is used to develop primers for mPCR-NGS to amplify and detectthe mutation.

Methods provided herein can be used to detect virtually any type ofmutation, especially mutations known to be associated with cancer andmost particularly the methods provided herein are directed to mutations,especially SNVs, associated with cancer. Exemplary SNVs can be in one ormore of the following genes: EGFR, FGFR1, FGFR2, ALK, MET, ROS1, NTRK1,RET, HER2, DDR2, PDGFRA, KRAS, NF1, BRAF, PIK3CA, MEK1, NOTCH1, MLL2,EZH2, TET2, DNMT3A, SOX2, MYC, KEAP1, CDKN2A, NRG1, TP53, LKB1, andPTEN, which have been identified in various lung cancer samples as beingmutated, having increased copy numbers, or being fused to other genesand combinations thereof (Non-small-cell lung cancers: a heterogeneousset of diseases. Chen et al. Nat. Rev. Cancer. 2014 Aug. 14(8):535-551).In another example, the list of genes are those listed above, where SNVshave been reported, such as in the cited Chen et al. reference.

Other exemplary polymorphisms or mutations are in one or more of thefollowing genes: TP53, PTEN, PIK3CA, APC, EGFR, NRAS, NF2, FBXW7, ERBBs,ATAD5, KRAS, BRAF, VEGF, EGFR, HER2, ALK, p53, BRCA, BRCA1, BRCA2,SETD2, LRP1B, PBRM, SPTA1, DNMT3A, ARID1A, GRIN2A, TRRAP, STAG2,EPHA3/5/7, POLE, SYNE1, C20orf80, CSMD1, CTNNB1, ERBB2. FBXW7, KIT,MUC4, ATM, CDH1, DDX11, DDX12, DSPP, EPPK1, FAM186A, GNAS, HRNR,KRTAP4-11, MAP2K4, MLL3, NRAS, RB1, SMAD4, TTN, ABCC9, ACVR1B, ADAM29,ADAMTS19, AGAP10, AKT1, AMBN, AMPD2, ANKRD30A, ANKRD40, APOBR, AR,BIRC6, BMP2, BRAT1, BTNL8, C12orf4, C1QTNF7, C20orf186, CAPRIN2, CBWD1,CCDC30, CCDC93, CD5L, CDC27, CDCl₄2BPA, CDH9, CDKN2A, CHD8, CHEK2,CHRNA9, CIZ1, CLSPN, CNTN6, COL14A1, CREBBP, CROCC, CTSF, CYP1A2, DCLK1,DHDDS, DHX32, DKK2, DLEC1, DNAH14, DNAH5, DNAH9, DNASE1L3, DUSP16,DYNC2H1, ECT2, EFHB, RRN3P2, TRIM49B, TUBB8P5, EPHA7, ERBB3, ERCC6,FAM21A, FAM21C, FCGBP, FGFR2, FLG2, FLT1, FOLR2, FRYL, FSCB, GAB1,GABRA4, GABRP, GH2, GOLGA6L1, GPHB5, GPR32, GPX5, GTF3C3, HECW1,HIST1H3B, HLA-A, HRAS, HS3ST1, HS6ST1, HSPD1, IDH1, JAK2, KDM5B,KIAA0528, KRT15, KRT38, KRTAP21-1, KRTAP4-5, KRTAP4-7, KRTAP5-4,KRTAP5-5, LAMA4, LATS1, LMF1, LPAR4, LPPR4, LRRFIP1, LUM, LYST, MAP2K1,MARCH1, MARCO, MB21D2, MEGF10, MMP16, MORC1, MRE11A, MTMR3, MUC12,MUC17, MUC2, MUC20, NBPF10, NBPF20, NEK1, NFE2L2, NLRP4, NOTCH2, NRK,NUP93, OBSCN, OR11H1, OR2B11, OR2M4, OR4Q3, OR5D13, 0R812, OXSM, PIK3R1,PPP2R5C, PRAME, PRF1, PRG4, PRPF19, PTH2, PTPRC, PTPRJ, RAC1, RAD50,RBM12, RGPD3, RGS22, ROR1, RP11-671M22.1, RP13-996F3.4, RP1L1, RSBN1L,RYR3, SAMD3, SCN3A, SEC31A, SF1, SF3B1, SLC25A2, SLC44A1, SLC4A11,SMAD2, SPTA1, ST6GAL2, STK11, SZT2, TAF1L, TAX1BP1, TBP, TGFBI, TIF1,TMEM14B, TMEM74, TPTE, TRAPPC8, TRPS1, TXNDC6, USP32, UTP20, VASN,VPS72, WASH3P, WWTR1, XPO1, ZFHX4, ZMIZ1, ZNF167, ZNF436, ZNF492,ZNF598, ZRSR2, ABL1, AKT2, AKT3, ARAF, ARFRP1, ARID2, ASXL1, ATR, ATRX,AURKA, AURKB, AXL, BAP1, BARD1, BCL2, BCL2L2, BCL6, BCOR, BCORL1, BLM,BRIP1, BTK, CARD11, CBFB, CBL, CCND1, CCND2, CCND3, CCNE1, CD79A, CD79B,CDCl₇3, CDK12, CDK4, CDK6, CDK8, CDKN1B, CDKN2B, CDKN2C, CEBPA, CHEK1,CIC, CRKL, CRLF2, CSF1R, CTCF, CTNNA1, DAXX, DDR2, DOT1L, EMSY(C11orf30), EP300, EPHA3, EPHA5, EPHB1, ERBB4, ERG, ESR1, EZH2, FAM123B(WTX), FAM46C, FANCA, FANCC, FANCD2, FANCE, FANCF, FANCG, FANCL, FGF10,FGF14, FGF19, FGF23, FGF3, FGF4, FGF6, FGFR1, FGFR2, FGFR3, FGFR4, FLT3,FLT4, FOXL2, GATA1, GATA2, GATA3, GID4 (C17orf39), GNA11, GNA13, GNAQ,GNAS, GPR124, GSK3B, HGF, IDH1, IDH2, IGF1R, IKBKE, IKZF1, IL7R, INHBA,IRF4, IRS2, JAK1, JAK3, JUN, KAT6A (MYST3), KDM5A, KDM5C, KDM6A, KDR,KEAP1, KLHL6, MAP2K2, MAP2K4, MAP3K1, MCL1, MDM2, MDM4, MED12, MEF2B,MEN1, MET, MITF, MLH1, MLL, MLL2, MPL, MSH2, MSH6, MTOR, MUTYH, MYC,MYCL1, MYCN, MYD88, NF1, NFKBIA, NKX2-1, NOTCH1, NPM1, NRAS, NTRK1,NTRK2, NTRK3, PAK3, PALB2, PAX5, PBRM1, PDGFRA, PDGFRB, PDK1, PIK3CG,PIK3R2, PPP2R1A, PRDM1, PRKAR1A, PRKDC, PTCH1, PTPN11, RAD51, RAF1,RARA, RET, RICTOR, RNF43, RPTOR, RUNX1, SMARCA4, SMARCB1, SMO, SOCS1,SOX10, SOX2, SPEN, SPOP, SRC, STAT4, SUFU, TET2, TGFBR2, TNFAIP3,TNFRSF14, TOP1, TP53, TSC1, TSC2, TSHR, VHL, WISPS, WT1, ZNF217, ZNF703,and combinations thereof (Su et al., J Mol Diagn 2011, 13:74-84;DOI:10.1016/j.jmoldx.2010.11.010; and Abaan et al., “The Exomes of theNCI-60 Panel: A Genomic Resource for Cancer Biology and SystemsPharmacology”, Cancer Research, Jul. 15, 2013, which are each herebyincorporated by reference in its entirety). Exemplary polymorphisms ormutations can be in one or more of the following microRNAs: miR-15a,miR-16-1, miR-23a, miR-23b, miR-24-1, miR-24-2, miR-27a, miR-27b,miR-29b-2, miR-29c, miR-146, miR-155, miR-221, miR-222, and miR-223(Calin et al. “A microRNA signature associated with prognosis andprogression in chronic lymphocytic leukemia.” N Engl J Med 353:1793-801,2005, which is hereby incorporated by reference in its entirety).

Amplification (e.g. PCR) Reaction Mixtures

Methods of the present invention, in certain embodiments, includeforming an amplification reaction mixture. The reaction mixturetypically is formed by combining a polymerase, nucleotide triphosphates,nucleic acid fragments from a nucleic acid library generated from thesample, a series of forward target-specific outer primers and a firststrand reverse outer universal primer. Another illustrative embodimentis a reaction mixture that includes forward target-specific innerprimers instead of the forward target-specific outer primers andamplicons from a first PCR reaction using the outer primers, instead ofnucleic acid fragments from the nucleic acid library. The reactionmixtures provided herein, themselves forming in illustrativeembodiments, a separate aspect of the invention. In illustrativeembodiments, the reaction mixtures are PCR reaction mixtures. PCRreaction mixtures typically include magnesium.

In some embodiments, the reaction mixture includesethylenediaminetetraacetic acid (EDTA), magnesium, tetramethyl ammoniumchloride (TMAC), or any combination thereof. In some embodiments, theconcentration of TMAC is between 20 and 70 mM, inclusive. While notmeant to be bound to any particular theory, it is believed that TMACbinds to DNA, stabilizes duplexes, increases primer specificity, and/orequalizes the melting temperatures of different primers. In someembodiments, TMAC increases the uniformity in the amount of amplifiedproducts for the different targets. In some embodiments, theconcentration of magnesium (such as magnesium from magnesium chloride)is between 1 and 8 mM.

The large number of primers used for multiplex PCR of a large number oftargets may chelate a lot of the magnesium (2 phosphates in the primerschelate 1 magnesium). For example, if enough primers are used such thatthe concentration of phosphate from the primers is ˜9 mM, then theprimers may reduce the effective magnesium concentration by ˜4.5 mM. Insome embodiments, EDTA is used to decrease the amount of magnesiumavailable as a cofactor for the polymerase since high concentrations ofmagnesium can result in PCR errors, such as amplification of non-targetloci. In some embodiments, the concentration of EDTA reduces the amountof available magnesium to between 1 and 5 mM (such as between 3 and 5mM).

In some embodiments, the pH is between 7.5 and 8.5, such as between 7.5and 8, 8 and 8.3, or 8.3 and 8.5, inclusive. In some embodiments, Trisis used at, for example, a concentration of between 10 and 100 mM, suchas between 10 and 25 mM, 25 and 50 mM, 50 and 75 mM, or 25 and 75 mM,inclusive. In some embodiments, any of these concentrations of Tris areused at a pH between 7.5 and 8.5. In some embodiments, a combination ofKCl and (NH₄)₂SO₄ is used, such as between 50 and 150 mM KCl and between10 and 90 mM (NH₄)₂SO₄, inclusive. In some embodiments, theconcentration of KCl is between 0 and 30 mM, between 50 and 100 mM, orbetween 100 and 150 mM, inclusive. In some embodiments, theconcentration of (NH₄)₂SO₄ is between 10 and 50 mM, 50 and 90 mM, 10 and20 mM, 20 and 40 mM, 40 and 60 mM, or 60 and 80 mM (NH₄)₂SO₄, inclusive.In some embodiments, the ammonium [NH₄+] concentration is between 0 and160 mM, such as between 0 to 50, 50 to 100, or 100 to 160 mM, inclusive.In some embodiments, the sum of the potassium and ammonium concentration([K⁺]+[NH₄]) is between 0 and 160 mM, such as between 0 to 25, 25 to 50,50 to 150, 50 to 75, 75 to 100, 100 to 125, or 125 to 160 mM, inclusive.An exemplary buffer with [K⁺]+[NH₄+]=120 mM is 20 mM KCl and 50 mM(NH₄)₂SO₄. In some embodiments, the buffer includes 25 to 75 mM Tris, pH7.2 to 8, 0 to 50 mM KCl, 10 to 80 mM ammonium sulfate, and 3 to 6 mMmagnesium, inclusive. In some embodiments, the buffer includes 25 to 75mM Tris pH 7 to 8.5, 3 to 6 mM MgCl₂, 10 to 50 mM KCl, and 20 to 80 mM(NH₄)₂SO₄, inclusive. In some embodiments, 100 to 200 Units/mL ofpolymerase are used. In some embodiments, 100 mM KCl, 50 mM (NH₄)₂SO₄, 3mM MgCl₂, 7.5 nM of each primer in the library, 50 mM TMAC, and 7 ul DNAtemplate in a 20 ul final volume at pH 8.1 is used.

In some embodiments, a crowding agent is used, such as polyethyleneglycol (PEG, such as PEG 8,000) or glycerol. In some embodiments, theamount of PEG (such as PEG 8,000) is between 0.1 to 20%, such as between0.5 to 15%, 1 to 10%, 2 to 8%, or 4 to 8%, inclusive. In someembodiments, the amount of glycerol is between 0.1 to 20%, such asbetween 0.5 to 15%, 1 to 10%, 2 to 8%, or 4 to 8%, inclusive. In someembodiments, a crowding agent allows either a low polymeraseconcentration and/or a shorter annealing time to be used. In someembodiments, a crowding agent improves the uniformity of the DOR and/orreduces dropouts (undetected alleles).

In some embodiments, a polymerase with proof-reading activity, apolymerase without (or with negligible) proof-reading activity, or amixture of a polymerase with proof-reading activity and a polymerasewithout (or with negligible) proof-reading activity is used. In someembodiments, a hot start polymerase, a non-hot start polymerase, or amixture of a hot start polymerase and a non-hot start polymerase isused. In some embodiments, a HotStarTaq DNA polymerase is used (see, forexample, QIAGEN catalog No. 203203). In some embodiments, AmpliTaq Gold®DNA Polymerase is used. In some embodiments a PrimeSTAR GXL DNApolymerase, a high fidelity polymerase that provides efficient PCRamplification when there is excess template in the reaction mixture, andwhen amplifying long products, is used (Takara Clontech, Mountain View,Calif.). In some embodiments, KAPA Taq DNA Polymerase or KAPA TaqHotStart DNA Polymerase is used; they are based on the single-subunit,wild-type Taq DNA polymerase of the thermophilic bacterium Thermusaquaticus. KAPA Taq and KAPA Taq HotStart DNA Polymerase have 5′-3′polymerase and 5′-3′ exonuclease activities, but no 3′ to 5′ exonuclease(proofreading) activity (see, for example, KAPA BIOSYSTEMS catalog No.BK1000). In some embodiments, Pfu DNA polymerase is used; it is a highlythermostable DNA polymerase from the hyperthermophilic archaeumPyrococcus furiosus. The enzyme catalyzes the template-dependentpolymerization of nucleotides into duplex DNA in the 5′→3′ direction.Pfu DNA Polymerase also exhibits 3′→5′ exonuclease (proofreading)activity that enables the polymerase to correct nucleotide incorporationerrors. It has no 5′→3′ exonuclease activity (see, for example, ThermoScientific catalog No. EP0501). In some embodiments Klentaq1 is used; itis a Klenow-fragment analog of Taq DNA polymerase, it has no exonucleaseor endonuclease activity (see, for example, DNA POLYMERASE TECHNOLOGY,Inc, St. Louis, Mo., catalog No. 100). In some embodiments, thepolymerase is a PHUSION DNA polymerase, such as PHUSION High FidelityDNA polymerase (M0530S, New England BioLabs, Inc.) or PHUSION Hot StartFlex DNA polymerase (M0535S, New England BioLabs, Inc.). In someembodiments, the polymerase is a Q5® DNA Polymerase, such as Q5®High-Fidelity DNA Polymerase (M0491S, New England BioLabs, Inc.) or Q5®Hot Start High-Fidelity DNA Polymerase (M0493S, New England BioLabs,Inc.). In some embodiments, the polymerase is a T4 DNA polymerase(M0203S, New England BioLabs, Inc.).

In some embodiment, between 5 and 600 Units/mL (Units per 1 mL ofreaction volume) of polymerase is used, such as between 5 to 100, 100 to200, 200 to 300, 300 to 400, 400 to 500, or 500 to 600 Units/mL,inclusive.

PCR Methods. In some embodiments, hot-start PCR is used to reduce orprevent polymerization prior to PCR thermocycling. Exemplary hot-startPCR methods include initial inhibition of the DNA polymerase, orphysical separation of reaction components reaction until the reactionmixture reaches the higher temperatures. In some embodiments, slowrelease of magnesium is used. DNA polymerase requires magnesium ions foractivity, so the magnesium is chemically separated from the reaction bybinding to a chemical compound, and is released into the solution onlyat high temperature. In some embodiments, non-covalent binding of aninhibitor is used. In this method a peptide, antibody, or aptamer arenon-covalently bound to the enzyme at low temperature and inhibit itsactivity. After incubation at elevated temperature, the inhibitor isreleased and the reaction starts. In some embodiments, a cold-sensitiveTaq polymerase is used, such as a modified DNA polymerase with almost noactivity at low temperature. In some embodiments, chemical modificationis used. In this method, a molecule is covalently bound to the sidechain of an amino acid in the active site of the DNA polymerase. Themolecule is released from the enzyme by incubation of the reactionmixture at elevated temperature. Once the molecule is released, theenzyme is activated.

In some embodiments, the amount to template nucleic acids (such as anRNA or DNA sample) is between 20 and 5,000 ng, such as between 20 to200, 200 to 400, 400 to 600, 600 to 1,000; 1,000 to 1,500; or 2,000 to3,000 ng, inclusive.

In some embodiments a QIAGEN Multiplex PCR Kit is used (QIAGEN catalogNo.

206143). For 100×50 μl multiplex PCR reactions, the kit includes2×QIAGEN Multiplex PCR Master Mix (providing a final concentration of 3mM MgCl₂, 3×0.85 ml), 5× Q-Solution (1×2.0 ml), and RNase-Free Water(2×1.7 ml). The QIAGEN Multiplex PCR Master Mix (MM) contains acombination of KCl and (NH₄)₂SO₄ as well as the PCR additive, Factor MP,which increases the local concentration of primers at the template.Factor MP stabilizes specifically bound primers, allowing efficientprimer extension by HotStarTaq DNA Polymerase. HotStarTaq DNA Polymeraseis a modified form of Taq DNA polymerase and has no polymerase activityat ambient temperatures. In some embodiments, HotStarTaq DNA Polymeraseis activated by a 15-minute incubation at 95° C. which can beincorporated into any existing thermal-cycler program.

In some embodiments, 1×QIAGEN MM final concentration (the recommendedconcentration), 7.5 nM of each primer in the library, 50 mM TMAC, and 7ul DNA template in a 20 ul final volume is used. In some embodiments,the PCR thermocycling conditions include 95° C. for 10 minutes (hotstart); 20 cycles of 96° C. for 30 seconds; 65° C. for 15 minutes; and72° C. for 30 seconds; followed by 72° C. for 2 minutes (finalextension); and then a 4° C. hold.

In some embodiments, 2×QIAGEN MM final concentration (twice therecommended concentration), 2 nM of each primer in the library, 70 mMTMAC, and 7 ul DNA template in a 20 ul total volume is used. In someembodiments, up to 4 mM EDTA is also included. In some embodiments, thePCR thermocycling conditions include 95° C. for 10 minutes (hot start);25 cycles of 96° C. for 30 seconds; 65° C. for 20, 25, 30, 45, 60, 120,or 180 minutes; and optionally 72° C. for 30 seconds); followed by 72°C. for 2 minutes (final extension); and then a 4° C. hold.

Another exemplary set of conditions includes a semi-nested PCR approach.The first PCR reaction uses 20 ul a reaction volume with 2×QIAGEN MMfinal concentration, 1.875 nM of each primer in the library (outerforward and reverse primers), and DNA template. Thermocycling parametersinclude 95° C. for 10 minutes; 25 cycles of 96° C. for 30 seconds, 65°C. for 1 minute, 58° C. for 6 minutes, 60° C. for 8 minutes, 65° C. for4 minutes, and 72° C. for 30 seconds; and then 72° C. for 2 minutes, andthen a 4° C. hold. Next, 2 ul of the resulting product, diluted 1:200,is used as input in a second PCR reaction. This reaction uses a 10 ulreaction volume with 1×QIAGEN MM final concentration, 20 nM of eachinner forward primer, and 1 uM of reverse primer tag. Thermocyclingparameters include 95° C. for 10 minutes; 15 cycles of 95° C. for 30seconds, 65° C. for 1 minute, 60° C. for 5 minutes, 65° C. for 5minutes, and 72° C. for 30 seconds; and then 72° C. for 2 minutes, andthen a 4° C. hold. The annealing temperature can optionally be higherthan the melting temperatures of some or all of the primers, asdiscussed herein (see U.S. patent application Ser. No. 14/918,544, filedOct. 20, 2015, which is herein incorporated by reference in itsentirety).

The melting temperature (T_(m)) is the temperature at which one-half(50%) of a DNA duplex of an oligonucleotide (such as a primer) and itsperfect complement dissociates and becomes single strand DNA. Theannealing temperature (T_(A)) is the temperature one runs the PCRprotocol at. For prior methods, it is usually 5° C. below the lowestT_(m) of the primers used, thus close to all possible duplexes areformed (such that essentially all the primer molecules bind the templatenucleic acid). While this is highly efficient, at lower temperaturesthere are more unspecific reactions bound to occur. One consequence ofhaving too low a T_(A) is that primers may anneal to sequences otherthan the true target, as internal single-base mismatches or partialannealing may be tolerated. In some embodiments of the presentinventions, the T_(A) is higher than T_(m), where at a given moment onlya small fraction of the targets have a primer annealed (such as only˜1-5%). If these get extended, they are removed from the equilibrium ofannealing and dissociating primers and target (as extension increasesT_(m) quickly to above 70° C.), and a new ˜1-5% of targets has primers.Thus, by giving the reaction a long time for annealing, one can get˜100% of the targets copied per cycle.

In various embodiments, the annealing temperature is between 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 11, 12, 13° C. and 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, or 15° C. on the high end of the range, greater than the meltingtemperature (such as the empirically measured or calculated T_(m)) of atleast 25, 50, 60, 70, 75, 80, 90, 95, or 100% of the non-identicalprimers. In various embodiments, the annealing temperature is between 1and 15° C. (such as between 1 to 10, 1 to 5, 1 to 3, 3 to 5, 5 to 10, 5to 8, 8 to 10, 10 to 12, or 12 to 15° C., inclusive) greater than themelting temperature (such as the empirically measured or calculatedT_(m)) of at least 25; 50; 75; 100; 300; 500; 750; 1,000; 2,000; 5,000;7,500; 10,000; 15,000; 19,000; 20,000; 25,000; 27,000; 28,000; 30,000;40,000; 50,000; 75,000; 100,000; or all of the non-identical primers. Invarious embodiments, the annealing temperature is between 1 and 15° C.(such as between 1 to 10, 1 to 5, 1 to 3, 3 to 5, 3 to 8, 5 to 10, 5 to8, 8 to 10, 10 to 12, or 12 to 15° C., inclusive) greater than themelting temperature (such as the empirically measured or calculatedT_(m)) of at least 25%, 50%, 60%, 70%, 75%, 80%, 90%, 95%, or all of thenon-identical primers, and the length of the annealing step (per PCRcycle) is between 5 and 180 minutes, such as 15 and 120 minutes, 15 and60 minutes, 15 and 45 minutes, or 20 and 60 minutes, inclusive.

Exemplary Multiplex PCR. In various embodiments, long annealing times(as discussed herein and exemplified in Example 12) and/or low primerconcentrations are used. In fact, in certain embodiments, limitingprimer concentrations and/or conditions are used. In variousembodiments, the length of the annealing step is between 15, 20, 25, 30,35, 40, 45, or 60 minutes on the low end of the range and 20, 25, 30,35, 40, 45, 60, 120, or 180 minutes on the high end of the range. Invarious embodiments, the length of the annealing step (per PCR cycle) isbetween 30 and 180 minutes. For example, the annealing step can bebetween 30 and 60 minutes and the concentration of each primer can beless than 20, 15, 10, or 5 nM. In other embodiments the primerconcentration is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 25 nM on thelow end of the range, and 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, and 50on the high end of the range.

At high level of multiplexing, the solution may become viscous due tothe large amount of primers in solution. If the solution is too viscous,one can reduce the primer concentration to an amount that is stillsufficient for the primers to bind the template DNA. In variousembodiments, between 1,000 and 100,000 different primers are used andthe concentration of each primer is less than 20 nM, such as less than10 nM or between 1 and 10 nM, inclusive.

Experimental Section

The presently disclosed embodiments are described in the followingExamples, which are set forth to aid in the understanding of thedisclosure, and should not be construed to limit in any way the scope ofthe disclosure as defined in the claims which follow thereafter. Thefollowing examples are put forth so as to provide those of ordinaryskill in the art with a complete disclosure and description of how touse the described embodiments, and are not intended to limit the scopeof the disclosure nor are they intended to represent that theexperiments below are all or the only experiments performed. Effortshave been made to ensure accuracy with respect to numbers used (e.g.amounts, temperature, etc.) but some experimental errors and deviationsshould be accounted for. Unless indicated otherwise, parts are parts byvolume, and temperature is in degrees Centigrade. It should beunderstood that variations in the methods as described may be madewithout changing the fundamental aspects that the experiments are meantto illustrate.

Example 1

Retrospective analysis of blood samples from kidney transplantrecipients (292 plasma samples from 187 unique patients, with 8 samplesexcluded) was performed after patients were assessed for graft conditionby biopsy. Biopsies were graded by Banff classification for T cell- andantibody-mediated acute rejection (AR) or non-AR (borderline, stable, orother injury). The biopsy-analyzed samples were found to include 52samples in acute rejection (AR) and 240 samples in non-acute rejection(Non-AR), including being in borderline rejection, having other injury,or being stable.

Circulating free DNA from 2 mL of plasma from each sample was extractedby the Qiagen cfDNA kit. The amount of cfDNA was then quantified, usingLapChip. Library preparation was accomplished using the Natera PanoramaLibrary Prep Kit using the standard protocol, except that library wasamplified by 18 PCR cycles (as opposed to the standard 9 cycles). Theamplified library was then purified using Ampure beads (Agencourt). Theamplified library product was then quantified again using LabChip and aquality control step was performed. This was followed by Panorama V2OneSTAR, dilution, and BC-PCR.

The samples were then pooled for sequencing, purification (Qiagen Kit),quantification (Qubit), and quality control (Bioanalyzer).

The percentage of donor derived cell free DNA in the transplantrecipient plasma was determined using massively multiplexed PCR, whichtargeted 13,392 single-nucleotide polymorphisms (SNPs), followed by NGSsequencing on a HiSeq2500 machine (Illumina) for 50 cycles (28-29samples/run=10-11M reads/sample).

Levels of dd-cfDNA were then correlated with rejection and transplantinjury status and were found to demonstrate high capacity for detectionof kidney transplant rejection. Specifically, it was found that dd-cfDNAat a level of above 1% (out of total free circulating DNA) serves as asuitable threshold for classifying a kidney transplant as undergoingacute rejection (AR). See FIG. 2. For transplants not undergoing acuterejection, each of the categories of the transplants being stable,borderline rejected or undergoing other injury were alone each under the1% dd-cfDNA threshold level. See FIG. 3.

Further, when classifying samples where dd-cfDNA was greater than 1%, itwas found that less than 1 in 20 were stable.

Borderline Acute Group Stable Rejection Other Injury Rejection >1%dd-cfDNA, 5 (4.4) 26 (23.0) 34 (30.1) 48 (42.5) n (%)

In the 52 samples undergoing acute rejection, 19 were classified bybiopsy as undergoing antibody mediated rejection (ABMR), 32 wereclassified as undergoing T-cell mediated rejection (TCMR) and 1 samplewas classified as undergoing both types of rejection. It was found thatthe fraction of dd-cfDNA did not differ significantly between ABMR andTCMR cohorts or between borderline ABMR and TCMR cohorts. See FIG. 4.

Further, when the days since transplant was compared to the percentagelevel of dd-cfDNA and rejection status of kidney transplants, it wasfound that the 1% dd-cfDNA threshold level served as a clinicallyrelevant biomarker immediately following surgery. See FIG. 5.

Value was also found in repeated measurements within individualpatients, as the change from a stable transplant to an injuredtransplant could be monitored over time. See FIG. 6.

When the performance metrics of the current study was compared to aprevious study from Bloom, R D et al., Cell-Free DNA and ActiveRejection in Kidney Allografts, J. Am. Soc. Nephrol., 2017;287(7):2221-2232, it was found that the present methods resulted insignificantly greater sensitivity and specificity.

Current Study Bloom et al. (292 samples) (107 samples) PerformanceMetrics Sensitivity 92% (n = 52) 59% (n= 27) Specificity 73% (n = 240)85% (n = 80) AUC 0.90 0.74 Assuming 25% Prevalence NPV 97% 84% PPV 53%61%

As such, the presently disclosed assay offers certain technicaladvantages. For example, the assay disclosed herein comprised advancedcfDNA isolation and preparation, with size selection to eliminatebackground noise and is able to filer PCR and NGS errors throughadvanced error modeling. Further, the present assay used more SNPS(13,392 v. 266 disclosed in Bloom et al.) with advanced SNP selections.

Example 2. Optimizing Detection of Kidney Transplant Injury byAssessment of Donor-Derived Cell-Free DNA by Massively Multiplex PCR

Introduction

Precision medicine and personalized tailoring of immunosuppressive drugregimens can improve the current state of organ transplant management.Transplantation injuries are often detected late given that invasivebiopsies are best avoided, where possible, for the sake of the patientexperience. Though advancements in immunosuppressive drugs, organprocurement methods, and human leukocyte antigen-typing has lowered thenumber of clinical- and biopsy-confirmed acute rejection episodes,sub-clinical acute rejection of kidney grafts remains a significantrisk. Kidney transplant management is particularly challenging owing toredundancy of the serum creatinine assay, which, in addition to the latedetection of transplant injuries, makes immunosuppression dosage andadjustment far from personalized. Therefore, rapid and non-invasivedetection and prediction of allograft injury/rejection holds promise forsignificant improvement of management in kidney transplantationpatients.

Diagnosis of acute renal transplant rejection is generally dependent onan increase in serum creatinine levels or its algorithmic derivative,eGFR, which indicates altered renal filtration functioning. Since thereare many causes of the baseline drift in altered renal filtering inthese patients, biopsy is required for definitive diagnosis. Methods ofestimating kidney rejection in allograft recipients based on CR or eGFRlack sufficient accuracy. However, biopsies are invasive and can becostly procedures, which limit their use in clinical practice.Furthermore, biopsy results are often plagued by expert reader varianceand can lead to delayed diagnosis of acute rejection, after whichirreversible organ damage has already taken place. Therefore, there is acurrent unmet need for a rapid, accurate, and noninvasive approach todetecting allograft rejection and/or injury—one which may requireintegration of the current “gold” standard morphological assessmentswith modern molecular diagnostic tools.

Donor-derived cell-free DNA (dd-cfDNA) detected in the blood oftransplant recipients has been reported as a noninvasive marker todiagnose allograft injury/rejection, and holds promise for producingfaster and more quantitative results compared with current treatmentoptions. Recently, it was demonstrated that plasma levels of dd-cfDNAcan discriminate active rejection status from stable organ function inkidney transplant recipients, using a 1% cutoff. Previously we validatedthe clinical application of a targeted, single-nucleotide polymorphism(SNP)-based cell-free assay targeting greater than 10,000 loci as asuccessful screening tool for the detection of fetal chromosomalabnormalities and show here that a similar approach targeting 13,392SNPs can be used to evaluate differences in donor cfDNA burden indifferent transplant rejection injuries over time. This study uses anovel SNP-based mmPCR-NGS methodology to measure dd-cfDNA in renaltransplant recipients for the detection of allograft rejection/injurywithout prior knowledge of donor genotypes.

Materials and Methods

Study Design

This study was a retrospective analysis of blood samples from kidneytransplant recipients who had transplant surgeries at the University ofCalifornia—San Francisco (USCF) Medical

Center. The study was approved by the institutional review board at theUCSF Medical Center. All patients provided written informed consent toparticipate in the research, in full adherence to the Declaration ofHelsinki. The clinical and research activities being reported areconsistent with the Principles of the Declaration of Istanbul asoutlined in the Declaration of Istanbul on Organ Trafficking andTransplant Tourism.

Study Population and Samples

Blood samples were collected from male or female adult or young adultrecipients of kidney transplants at various time points followingtransplantation surgery. The selection of study samples was based on (a)if there is adequate plasma sample available, (b) if the blood sample isassociated biopsy information that could be used in data analysis.Patients had received a kidney from related or unrelated living donors,or unrelated deceased donors. Plasma samples were obtained from anexisting biorepository, of which 53% were matched with a biopsycollected at the time of blood collection. Patients without a matchingbiopsy were categorized as STA; all non-STA patients werebiopsy-matched.

Biopsy Samples

All kidney biopsies were analyzed in a blinded manner by a UCSFpathologist and were graded by the Banff classification for acuterejection (AR); intragraft C4d stains were performed to assess for acutehumoral rejection. Transplant “injury” was defined as a >20% increase inserum creatinine from its previous steady-state baseline value and anassociated biopsy that was classified as either AR, BL, or OI (e.g.,drug toxicity, viral infection). AR was defined, at minimum, by thefollowing criteria: 1) TCMR consisting of either a tubulitis (t)score >2 accompanied by an interstitial inflammation (i) score >2 orvascular changes (v) score >0; 2) C4d positive ABMR consisting ofpositive donor specific antibodies (DSA) with a glomerulitis (g)score >0/or peritubular capillaritis score (ptc)>0 or v>0 withunexplained acute tubular necrosis/thrombotic micro angiopathy (ATN/TMA)with C4d=2; or 3) C4d negative ABMR consisting of positive DSA withunexplained ATN/TMA with g+ptc≥2 and C4d is either 0 or 1. Borderlinechange (BL) was defined by t1+i0, or t1+i1, or t2+i0 without explainedcause (e.g., polyomavirus-associated nephropathy [PVAN]/infectiouscause/ATN). Other criteria used for BL changes were g>0 and/or ptc>0, orv>0 without DSA, or C4d or positive DSA, or positive C4d without nonzerog or ptc scores. Normal (STA) allografts were defined by an absence ofsignificant injury pathology as defined by Banff schema. Samples werestratified into an AR or non-AR groups (BL, STA, or 0I) for analyses.

Dd-cfDNA Measurement in Blood Samples

Cell-free DNA was extracted from the plasma samples using the QIAampCirculating Nucleic Acid Kit (Qiagen) and quantified on the LabChip NGS5k kit (Perkin Elmer) following the manufacturer's instructions.Extracted cfDNA was used as input into library preparation using theNatera Library Prep kit, with a modification of 18 cycles of libraryamplification to plateau the libraries. The purified libraries werequantified using LabChip NGS 5k. Target enrichment was accomplishedusing massively multiplexed-PCR (mmPCR). This was performed using amodified version of a previously described method, with 13,392 singlenucleotide polymorphisms (SNPs) targeted. The amplicons were thensequenced on an Illumina HiSeq 2500 Rapid Run, 50 cycles single end,with 10-11 million reads per sample.

Statistical Analyses of Dd-cfDNA, Creatinine, and eGFR

In each sample, dd-cfDNA levels were measured and correlated withrejection status; the results of dd-cfDNA analyses were compared withcreatinine and eGFR levels. Where applicable, all tests are two sided.Significance was always set at P<0.05. Because the distribution ofdd-cfDNA level found in patients was severely skewed among the groups ofinterest, these data were analyzed using a Kruskal-Wallis rank sum testfollowed by Dunn multiple comparison tests with Holm correction. TheeGFR (creatinine in mg/dL) was calculated as described previously.Briefly, eGFR=186*Serum Creatinine^(−1.154)* Age^(−0.203)* [1.210 ifBlack]*[0.742 if Female].

To evaluate the performance of dd-cfDNA level, creatinine, and eGFRscore (mL/min/1.73 m²) as rejection markers, samples were separated intoan AR group and a non-AR group (BL+STA+OI). Using this categorization,sensitivity, specificity, PPV, and NPV of each marker was determinedusing the following AR classification cut-offs: >1% for dd-cfDNA, >1.8mg/dL for creatinine, and <40.0 for eGFR. AUC of the receiver operatingcharacteristic (ROC) curve—an additional measure of discriminatingbetween AR and non-AR—was also calculated for each marker. Confidenceintervals for sensitivity and specificity were calculated using exactBinomial tests (Clopper-Pearson). Confidence intervals for PPV and NPVwere calculated with a normal approximation. Confidence interval for AUCwas calculated using the DeLong method.

Subanalyses evaluated dd-cfDNA levels by Banff score for individualhistological features (glomerulitis, allograft glomerulopathy, mesangialmatrix increase, interstitial fibrosis, tubular atrophy, interstitialinflammation, total interstitial inflammation, tubulitis, atrophictubulitis, peritubular capillaritis, arteriolar hyalinosis, alternativearteriolar hyalinosis, vascular intimal thickening, intimal arteritis,c4d staining). Elevated scores of glomerulitis, interstitialinflammation, total interstitial inflammation, tubulitis, peritubularcapillaritis, and c4d staining correlate with elevated levels ofdd-cfDNA by using a Kruskal-Wallis rank sum test followed by Dunnmultiple comparison tests. Differences in dd-cfDNA levels by donor type(living related, living non-related, and deceased non-related) were alsoevaluated. Significance was determined using the Kruskal-Wallis rank sumtest as described above. Inter- and intra-variability in dd-cfDNA overtime was evaluated using a mixed effects model with a logarithmictransformation on dd-cfDNA. The 95% confidence intervals for the intra-and inter-patient standard deviations were calculated using a likelihoodprofile method.

All analyses were done using R 3.3.2 using the FSA (for Dunn tests),lme4 (for mixed effect modeling) and pROC (for AUC calculations)packages.

Results

Patients and Blood Samples

A total of 300 plasma samples were collected from 193 unique renaltransplant recipients; of these, 8 samples from 6 patients were unableto be sequenced and were excluded from analyses. Among the 292 analyzedsamples, 52 were collected from patients with biopsy-proven acuterejection (AR), 82 were from patients with biopsy-proven borderlinerejection (BL), 73 were from patients with normal, stable allografts(STA), and 85 were from patients with biopsy indicating other injury(0I) (FIG. 13). Because it is desirable to detect the existence of ARversus any other condition, we defined non-AR as the group including allspecimens that were classified as STA, BL, or OI. A summary ofdemographic information and sample characteristics are provided in TableA. All pathology samples were read at UCSF, verified at the sameinstitution and were rated by all observers using the Banff criteria.

Dd-cfDNA in Plasma of Kidney Transplant Recipients

The amount of dd-cfDNA was significantly higher in the circulatingplasma of the AR group (median=2.76%) compared with the non-AR group(median=0.47%; P<0.0001) (FIG. 14A). Additionally, the median level ofdd-cfDNA was significantly higher in the AR group compared with all 3individual non-rejection subgroups: BL group (0.59%), STA group (0.19%),and OI (0.70%; all comparisons, P<0.0001) (Table B). Donor-derived cfDNAlevels were significantly lower in the STA group than in the BL or OIgroups (P<0.0001). There was no significant difference in the level ofdd-cfDNA between the BL and OI groups (P=0.496) (Table B).

Creatinine and eGFR Levels

In contrast to dd-cfDNA, evaluation of creatinine levels did not appearto have as much discriminatory ability for differentiating AR and non-ARgroups (FIG. 14B). The median creatinine level in the AR group (1.4mg/dL) was significantly higher than that observed in the non-AR group(1.1 mg/dL; P=0.0024). However, unlike the dd-cfDNA results, there wasno difference in median creatinine levels between the AR and BL groups(both 1.4 mg/dL; P=0.8653) (Table B). Median creatinine levels weresignificantly lower in the OI group (1.1 mg/dL) versus AR group (1.4;P=0.0078) and significantly lower in the STA group (0.9 mg/dL) versus BLgroup (1.4 mg/dL; P<0.0001); creatinine level was numerically lower inthe STA group compared with the OI group (1.1 mg/dL), though thedifference was not statistically significant (P=0.1887).

For samples with available eGFR scores (AR, n=52; non-AR, n=151 [BL,n=79; OI, n=65; STA, n=7]) median eGFR were similar between the AR group(52.5) and the non-AR group (54.7; P=0.2379) (FIG. 14C). There was asignificant difference between eGFR levels in the AR group versus theSTA group (69.3; P=0.0125), but no difference in eGFR score between ARand BL groups (52.0 vs 51.8; P=0.902) (Table B). Additionally, comparedwith the STA group, eGFR levels were significantly higher in the BL(51.8; P=0.0254) and OI (55.1; P=0.0413) groups.

Performance Estimates for Discriminatory Ability of Tests

With a cutoff of >1%, the mmPCR-NGS method had a 92.3% sensitivity (95%confidence interval [CI], 81.5%-97.9%) and 72.9% specificity (95% CI,66.8%-78.4%) for detection of AR. Sensitivity and specificity values areshown over the range of dd-cfDNA cutoffs in FIG. 15A. The area under thecurve (AUC) was 0.90 (95% CI, 0.85-0.95). Based on a 25% prevalence ofrejection in an at-risk population, the positive predictive value (PPV)was projected to be 53.2% (95% CI, 47.7%-58.7%) and the negativepredictive value (NPV) was projected to be 96.6% (95% CI, 69.8%-100%).

Sensitivity and specificity was lower using creatinine and eGFR asdiscriminatory tests (FIG. 15B-C). Using a creatinine level cutoff of1.8 mg/dL for AR, sensitivity and specificity values were 42.3% (95% CI,28.7%-56.8%) and 83.7% (78.3%-88.1%), respectively, with an AUC of 0.63(0.54-0.71). The projected PPV and NPV values of creatinine were 46.4%(35.7%-57.0%) and 81.3% (50.5%-100%), respectively. The sensitivity foreGFR analysis using a cutoff score of <40 was 38.8% (25.2%-53.8%) andthe specificity was 78.8% (71.4% to 85.0%) with an AUC of 0.56(0.46-0.66).

When comparing AR to STA only, the dd-cfDNA assay had a 92.3%sensitivity (95% confidence interval [CI], 81.5%-97.9%) and 93.2%specificity (95% CI, 84.7%-97.7%). Sensitivity and specificity valuesare shown over the range of dd-cfDNA cutoffs in FIG. 16. The area underthe curve (AUC) was 0.951 (95% CI, 0.91-1.0).

dd-cfDNA>1% by Rejection Status

Of the 292 patient samples, 113 (38.7%) had dd-cfDNA levels >1%. Ofthose, less than 1 in 20 were STA (5 samples [4.4%]); the remainder wereAR (48 samples [42.5%]), OI (34 samples [30.1%]), or BL (26 samples[23.0%]).

Relationship Between Dd-cfDNA and Acute Rejection Type

Of the 52 patients with biopsy-proven AR, 19 were classified asantibody-mediated rejection (ABMR) and 32 were classified asT-cell-mediated rejection (TCMR); 1 patient had a combination of bothABMR and TCMR. In addition, 18 patients had borderline ABMR (bAMBR) and64 patients had borderline TCMR (bTCMR). FIG. 17 shows the relationshipbetween dd-cfDNA level and type of rejection. Median dd-cfDNA did notdiffer significantly between AMBR (3.1%) or TCMR groups (2.4%; P=0.520)or between bAMBR (0.64%) and bTCMR groups (0.58; P=0.420). Significantdifferences were observed between ABMR and bABMR (P<0.001) and TCMR andbTCMR (P<0.001), in alignment with AR versus BL differences observedwith dd-cfDNA.

Modelling of dd-cfDNA as a Function of Banff Score

The distribution of dd-cfDNA level across different Banff scores wasevaluated for samples with a confirmed biopsy. Of 15 histologicalfeatures evaluated, six had significant results in dd-cfDNA level byscore: glomerulitis (P=0.0031), total interstitial inflammation(P=0.0001), interstitial inflammation (P<0.0001), peritubularcapillaritis (P=0.0001), tubulitis (P=0.0082), and c4d staining(P=0.0049) (FIG. 18). For each of the six histological features,dd-cfDNA levels by score and summary statistics for score comparisonsare shown in Tables C and D, respectively. Interstitial inflammationscores were highly significant, where dd-cfDNA level in group 0 wassignificantly lower than those in groups 1, 2, and 3. (FIG. 18). In ingroups with a score of 0, glomerulitis and peritubular capillaritisdd-cfDNA levels were significantly lower than those found in groups witha score of 3 and 2, respectively (FIG. 18; Table D).

Dd-cfDNA Levels by Donor Type

A Kruskal-Wallis rank sum test was used to assess the relationshipbetween dd-cfDNA level and donor type (living related, livingnon-related, and deceased non-related). Patients were grouped by theirdonor relationship and rejection status (AR/non-AR). For patients withmultiple samples the mean dd-cfDNA was taken. Looking at each rejectionstatus group, there was no significant difference among the medians ofdd-cfDNA level by donor type in either the AR (P=0.677) or non-AR group(P=0.463; FIG. 19).

dd-cfDNA Variability Over Time

Two subanalyses designed to evaluate the natural variability in dd-cfDNAover time were conducted. The first was a cross-sectional analysis of 60plasma samples from 60 different patients, collected immediatelyfollowing surgery (within 3 days [“0 months”]) or at 1, 3, 6, or 12months postsurgery. Among these STA patients, dd-cfDNA levels were lowerat month 0 than subsequent timepoints; however, for most of these STAsamples dd-cfDNA levels were <1% across all timepoints (FIG. 20A). Forpatients with AR, BL, or OI, nearly all were above the 1% dd-cfDNAthreshold across all timepoints evaluated. To evaluate the normalintra-patient variation in donor fraction, the second subanalysislongitudinally assessed 10 individual patients across 4 time points(variable for each patient). Overall, organ injury occurred at dd-cfDNAlevels above 1% and cfDNA levels in STA and OI patients did notfluctuate over time (FIG. 20B).

To compare inter- and intra-variability a linear-mixed model wasconstructed to stabilize the variance after logarithmic transformingdd-cfDNA levels. Using this approach and adjusting for time andAR/non-AR groups, an intra-class standard deviation of 0.25496 (95% CI,0.1093-0.3481) and an inter-patient standard deviation of 0.4296 (95%CI, 0.3751-0.4915) was obtained. This resulted in an intraclasscorrelation coefficient of 0.2523 indicating high dissimilarity withinpatients.

Discussion

In this study, median dd-cfDNA was significantly higher in the AR group(2.76%) versus the non-AR group (0.47%; P<0.0001). Analysis ofperformance estimates demonstrated that the mmPCR-NGS method was able todiscriminate active from non-active rejection status with an AUC of 0.90and high sensitivity (92.3%) and specificity (72.9%) at the AR cutoffof >1% dd-cfDNA. Based on a 25% prevalence of rejection, projected PPVand NPV were 96.6% and 53.2%, respectively. In contrast, serumcreatinine levels and eGFR were generally less discriminatory, with a42.3% sensitivity and 83.7% specificity, and projected PPV and NPV of46.4% and 81.3%, respectively. Therefore, if static serum creatininemeasurements were used as the sole clinical decision point, about 1 in 5patients would not be referred for an indication biopsy—this is incomparison to the projected NPV of dd-cfDNA, which suggests that only3-4 in 100 patients would miss an indication biopsy where it might beclinically necessary. Taken together, the superior performance of thisSNP-based dd-cfDNA assay over that of the current standard of care forthe evaluation of allograft rejection status holds promise for enablingpatients a greater opportunity for timely therapy in the case of anallograft injury.

Levels of dd-cfDNA also provided discrimination of AR from all threenon-AR subgroups (STA, BL, and CH); median dd-cfDNA levels weresignificantly higher for samples with biopsy-proven AR (2.8%) versus BL(0.6%), OI (0.7%), and STA (0.2%).

In a recent study that amplified hundreds of target SNPs in dd-cfDNA todetect active rejection in kidney allografts, that method was able todiscriminate AR from non-AR with an AUC of 0.74 and 59% sensitivity, 85%specificity. In comparison with that study, the novel dd-cfDNA testdescribed in the current study showed a higher AUC value (0.90) as wellas greater sensitivity (92%). On the other hand, specificity (73%) wasslightly lower in the current study, indicating that there may have beenmore false positives in this study. This is supported given that thespecificity rose to 93.2% when AR and STA groups were compared, whichsuggests that the false positives in the non-AR group were likely drivenby the BL and/or OI groups.

Another important finding of this study was that the fraction ofdd-cfDNA did not differ between ABMR and TCMR groups, with dd-cfDNAlevels of 3.1% and 2.4%, respectively. These results are of interestconsidering that previous study found significantly higher dd-cfDNAlevels for ABMR-based rejections (2.9%) than for TCMR-based rejections(<1.2%). Though the assay used in that study also measured dd-cfDNA, themethodologies between the two assays are of different design. It isunclear whether that test could not differentiate AR from non-AR incases of TCMR or if the result was due to the smaller sample size ofthat group in that study (n=11), given that different TCMR groups maybehave differently. Regardless, in the larger TCMR group evaluated inthis study (n=32), it appears that dd-cfDNA levels can accuratelydiscriminate AR from non-AR in both the ABMR and TCMR groups. Inaddition, dd-cfDNA levels were 0.6% in both borderline ABMR and TCMR,suggesting that the test may be sensitive enough to discriminateborderline cases from more severe cases in both groups.

One barrier to widespread clinical use of dd-cfDNA as a diagnostic toolfor monitoring organ transplant has been the limitations in measuringdd-cfDNA in certain cases, such as when the donor genotype is unknown orwhen the donor is a close relative. Given the design of the assay usedhere, it is possible to quantify dd-cfDNA without prior recipient ordonor genotyping. Further, there is no need for a computationaladjustment based on whether the donor is related to the recipient. Inthis study, evaluation of dd-cfDNA levels by donor type revealed thatregardless of donor type (living related, living non-related, deceasednon-related), dd-cfDNA levels were similar across all donor types withinthe AR and non-AR categories.

This study is a retrospective analysis of archived samples from asingle-center. However, the central geographical area enabled allbiopsies to be performed by a single pathologist, which may have helpedminimize variability in biopsy classification. Overall, samples wereselected based on the availability of biopsy information, which led tomissing information from some patient samples and may have impactedanalyses. For example, due to limited demographic information for somepatients, it was not possible to calculate eGFR for all samples; thisled to a reduced number of STA samples in the non-AR group for thismarker, which may have contributed to the lack of significant differenceobserved between the AR and non-AR groups. Importantly, allexperimenters were kept blinded during the process of data generation.Finally, the retrospective study design may have led to differences inpatient characteristics across the rejection groups; though the STAgroup was enriched with younger patients compared with the other groups,this is not surprising as younger patients are better suitedimmunologically to tolerate transplanted organs compared to older-agedpatients; further, the age differences likely did not affect theviability of the study objectives.

Strengths of this study include the variety of patient samples includedin the non-AR group, which comprised not only STA, but also BL and OIsamples. This allowed for additional analyses in this study, which foundthat dd-cfDNA was significantly different in the AR group versus BL andOI groups. Additional subanalyses by type of AR (ABMR and TCMR) as wellas by donor type demonstrated that dd-cfDNA levels were able todiscriminate AR versus non-AR in a variety of patient types. Further,the SNP-based mmPCR methodology used has been validated with over amillion samples in fetal cfDNA determinations; evidence indicates thatit is highly sensitive and specific for detecting rare or minor nucleicacid fractions in an in vivo plasma mixture. Finally, the inclusion oflongitudinal data enabled a unique evaluation of the natural variabilityof dd-cfDNA in transplant patients over time. Inter-patient variabilitydata demonstrated that between 0 and 12 months post-surgery, mostpatients with STA biopsies had dd-cfDNA levels below 1%. This suggeststhat the dd-cfDNA test may be used immediately after surgery todifferentiate whether a patient is stable or showing signs of AR, BL, orOI. Intra-patient variability data demonstrated that the results of theassay are generally consistent over time. Taken together, these datasuggest that this test can not only offer routine monitoring of the samepatients but also offer a variety of patients a reliable test todetermine rejection status at any time point post-surgery.

In conclusion, this study validates the use of dd-cfDNA in the blood asan accurate marker of kidney injury/rejection. This rapid, accurate, andnoninvasive technology may offer detection of significant renal injuryin select patients better than the current standard of care andtherefore offer the potential for better management and survival ofkidney allografts and recipient renal function.

TABLE A Demographics and Characteristics^(a) Non-Acute Rejection AcuteRejection Stable Borderline AR Other Injury Phenotype Characteristic (52samples) (81 samples)^(b) (82 samples) (85 samples)^(c) Recipient age,yr Mean ± SD 42.94 ± 15.39 18.41 ± 11.42 46.51 ± 14.20 45.81 ± 20.85Range 4-69  2-70 5-74 3-80 Male/female/NA Male 24 (46.2) 46 (56.8) 40(48.8) 33 (38.8) Female 28 (53.8) 34 (42.0) 42 (51.2) 52 (61.2) Unknown0 (0) 1 (1.2) 0 (0) 0 (0) Ethnicity Hispanic or Latino 18 (34.5) 3 (3.7)29 (35.4) 23 (27.1) Not Hispanic or Latino 31 (59.6) 4 (4.9) 50 (61) 42(49.4) Unknown 3 (3.7) 74 (91.4) 3 (3.7) 20 (23.5) Race groups, no. (%)White or Caucasian 15 (28.8) 1 (1.2) 14 (17.1) 22 (25.9) Black orAfrican American 6 (11.5) 1 (1.2) 15 (18.3) 7 (8.2) Asian or PacificIslander 8 (15.4) 1 (1.2) 18 (22) 4 (4.7) Other/Not reported 23 (44.2)78 (96.3) 35 (42.7) 52 (61.2) Recipient weight, kg Mean ± SD 82.0 ± 19.964.8 ± 11.2 78.6 ± 18.3 81.2 ± 18.9 Range 45−119 50-76 46−134 47−125Unknown 10 75 9 23 DSA positive, no. (%) Yes 18 (34.6) 0 (0) 19 (23.2) 4(4.7) No 18 (34.6) 1 (1.2) 53 (64.6) 20 (23.5) Not recorded 16 (30.8) 80(98.8) 10 (12.2) 61 (71.8) Indication for renal transplantation, no. (%)Glomerulonephritis 1 (1.9) 4 (4.9) 2 (2.4) 3 (3.5) Focal segmental 6(11.5) 4 (4.9) 10 12.2) 2 (12.2) glomerulosclerosis Diabetes mellitus 1(1.9) 1 1.2) 16 (19.5) 22 (25.9) Thin basement membrane 4 (7.7) 0 (0) 2(2.4) 3 (3.5) nephropathy Polycystic kidney disease 5 (9.6) 3 (3.7) 7(8.5) 7 (8.2) Solitary kidney 0 (0) 0 (0) 3 3.7) 0 (0) Hypertension 5(9.6) 1 (1.2) 13 15.9) 5 (5.9) IgA nephropathy 7 (13.5) 0 (0) 8 (9.8) 0(0) Lupus nephritis 2 (3.8) 0 (0) 1 (1.2) 3 3.5) ANCA-vasculitis 1 (1.9)0 (0) 2 (2.4) 0 (0) Other/Unknown 19 (38.5) 68 (84.0) 18 (22.0) 40(47.1) Donor source, no. (%) Living related 2 (3.8) 2 (2.5) 9 (11) 6(7.1) Living unrelated 4 (7.7) 44 (54.3) 19 (23.2) 31 (36.5) Deceasedunrelated 46 (88.5) 35 (43.2) 54 (65.9) 48 (56.5) ^(a)Characteristicsand demographic information is based on all samples; data reflectsmultiple samples for some patients. ^(b)Of the 81 samples linked tostable biopsy, 8 were unable to be sequenced and were excluded fromanalyses. ^(c)Other injury patients had other causes of graftdysfunction: chronic allograft nephropathy (57 samples), drug toxicity(18 samples), BK nephritis (4 samples), acute tubular necrosis (3samples), and transplant glomerulopathy (3 samples). DSA, donor specificantibodies; SD, standard deviation.

TABLE B Summary Statistics for dd-cfDNA, Creatinine, and eGFR TestsNon-Acute Rejection Parameter Acute Rejection Stable Borderline OtherInjury dd-cfDNA Number of samples 52 73 82 85 Mean (SD), % 3.08 (2.14)0.43 (0.85) 0.83 (0.77) 1.05 (1.11) Median, (Range), % 2.76 (0.1−12.6)0.19 (0.0-5.4) 0.59 (0.03−3.9) 0.70 (0.03-6.8) Creatinine Number ofsamples 52 72 82 85 Mean (SD), mg/dL 1.76 (1.11) 1.81 (2.66) 1.62 (0.98)1.33 (1.06) Median, (Range), mg/dL 1.41 (0.1-6.8) 0.91 (0.1−14.1) 1.40(0.3-7.0) 1.10 (0.1-7.8) eGFR Number of samples 49 7 79 65 Mean (SD),score 50.5 (22.6) 76.52 (23.3) 53.1 (20.7) 54.4 (19.5) Median, (Range),score 51.99 (8−100) 69.25 (47−109) 51.82 (6−109) 55.07 (7−106) dd-cfDNA,donor-derived cell-free DNA; eGFR, estimate glomerular filtration rate.

TABLE C Donor-derived cfDNA Levels in Six Histological Features withSignificant Differences in dd-cfDNA by Banff score Mean StdDev Mediandd-cfDNA dd-cfDNA dd-cfDNA Variable Score Count (%) (%) (%) Glomerulitis0 177 1.048542373 1.602791733 0.415 Glomerulitis 1 32 1.153 1.0673703590.6345 Glomerulitis 2 9 2.089888889 1.169621674 2.592 Glomerulitis 3 73.719714286 2.630592057 3.666 Glomerulitis N/A 67 1.2179253731.097801678 0.703 Interstitial 0 86 0.758093023 0.895235265 0.3495Inflammation Interstitial 1 26 1.631115385 1.329326997 1.13 InflammationInterstitial 2 16 1.6524375 0.794761681 1.7085 Inflammation Interstitial3 10 3.6249 2.989179205 2.8375 Inflammation Interstitial N/A 1541.160019481 1.594593679 0.5465 Inflammation Total Interstitial 0 670.752522388 0.965722172 0.32 Inflammation Total Interstitial 1 331.225666667 1.048161684 0.843 Inflammation Total Interstitial 2 241.480333333 1.171649809 1.2485 Inflammation Total Interstitial 3 153.084 2.541016866 2.009 Inflammation Total Interstitial N/A 1531.152169935 1.596842599 0.544 Inflammation Tubulitis 0 118 0.8995677971.066300911 0.5555 Tubulitis 1 17 1.936117647 1.73272967 1.562 Tubulitis2 9 3.523111111 2.162920505 3.775 Tubulitis 3 3 2.033666667 0.7429302341.809 Tubulitis N/A 145 1.186648276 1.635198975 0.544 Peritubular 0 1240.811048387 1.235469167 0.346 Capillaritis Peritubular 1 78 1.4997051281.841726957 0.954 Capillaritis Peritubular 2 13 2.134153846 1.807641552.016 Capillaritis Peritubular 3 7 2.865 2.936962093 1.809 CapillaritisPeritubular N/A 70 1.194142857 1.068454149 0.703 Capillaritis C4dStaining 0 184 1.104027174 1.592462534 0.551 C4d Staining 1 122.905666667 2.066374442 2.9775 C4d Staining 2 7 0.554142857 0.5402818490.204 C4d Staining 3 6 1.9865 1.587761538 1.56 C4d Staining N/A 831.14613253 1.138562549 0.637

TABLE D Histological Features with Significant Differences in dd-cfDNA,by Banff score Z Unadj. Adj. Comparison Statistic P-Value P-ValueGlomerulitis 0 vs 1 −1.52 0.1279 1.0000 Glomerulitis 0 vs 2 −2.69 0.00720.1740 Glomerulitis 0 vs 3 −3.59 0.0003 0.0107 Glomerulitis 1 vs 2 −1.640.1006 1.0000 Glomerulitis 1 vs 3 −2.61 0.0092 0.2062 Glomerulitis 2 vs3 −0.93 0.3524 1.0000 Peritubular Capillaritis 0 vs 1 −3.02 0.00250.0686 Peritubular Capillaritis 0 vs 2 −4.15 0.0000 0.0011 PeritubularCapillaritis 0 vs 3 −1.52 0.1290 1.0000 Peritubular Capillaritis 1 vs 2−1.99 0.0468 0.8894 Peritubular Capillaritis 1 vs 3 −0.56 0.5722 1.0000Peritubular Capillaritis 2 vs 3 0.42 0.6776 1.0000 InterstitialInflammation 0 vs 1 −3.41 0.0007 0.0198 Interstitial Inflammation 0 vs 2−3.91 0.0001 0.0031 Interstitial Inflammation 0 vs 3 −3.54 0.0004 0.0123Interstitial Inflammation 1 vs 2 −0.91 0.3625 1.0000 InterstitialInflammation 1 vs 3 −1.09 0.2736 1.0000 Interstitial Inflammation 2 vs 3−0.29 0.7696 1.0000 Total Interstitial Inflammation 0 vs 1 −2.87 0.00410.1019 Total Interstitial Inflammation 0 vs 2 −3.14 0.0017 0.0476 TotalInterstitial Inflammation 0 vs 3 −4.52 0.0000 0.0002 Total InterstitialInflammation 1 vs 2 −0.52 0.6053 1.0000 Total Interstitial Inflammation1 vs 3 −2.15 0.0312 0.6243 Total Interstitial Inflammation 2 vs 3 −1.610.1084 1.0000 Tubulitis 0 vs 1 −3.00 0.0027 0.0698 Tubulitis 0 vs 2−3.19 0.0014 0.0410 Tubulitis 0 vs 3 −1.44 0.1488 1.0000 Tubulitis 1 vs2 −2.61 0.0090 0.2062 Tubulitis 1 vs 3 −1.24 0.2150 1.0000 Tubulitis 2vs 3 −0.12 0.9050 1.0000 c4d Staining 0 vs 1 −3.85 0.0001 0.0038 c4dStaining 0 vs 2 0.21 0.8299 1.0000 c4d Staining 0 vs 3 −1.95 0.05170.9309 c4d Staining 1 vs 2 2.27 0.0233 0.4883 c4d Staining 1 vs 3 0.790.4292 1.0000 c4d Staining 2 vs 3 −1.43 0.1524 1.0000

All patents, patent applications, and published references cited hereinare hereby incorporated by reference in their entirety. While themethods of the present disclosure have been described in connection withthe specific embodiments thereof, it will be understood that it iscapable of further modification. Furthermore, this application isintended to cover any variations, uses, or adaptations of the methods ofthe present disclosure, including such departures from the presentdisclosure as come within known or customary practice in the art towhich the methods of the present disclosure pertain, and as fall withinthe scope of the appended claims.

Example 3. Validation of Detection of Kidney Transplant Injury byAssessment of Donor-Derived Cell-Free DNA by Massively Multiplex PCR andNext-Generation Sequencing

Introduction

Kidney transplantation is the best option for patients with end-stagerenal disease. According to United Network for Organ Sharing, more than19,000 kidneys were transplanted in the United States in 2016(cen.acs.org), and approximately, 200,000 patients are living with afunctional kidney transplant (NIH Medline plus). Despite life-longimmunosuppressive maintenance regimens designed to optimize thetherapeutic outcome, approximately, 20-30% of patients experiencedoverall renal graft failure within the first 5 years, and only 55% oftransplanted kidneys survive to 10 years (cen.acs.org). Thus, acompelling need exists for early intervention strategies to avoid orminimize acute/subclinical rejection episodes, nephrotoxicity, and beable to manage and monitor co-morbidities for better therapeuticoutcomes.

Current standard-of-care clinical options to monitor kidney health intransplant recipients include protocol-biopsies and assessing dynamicchanges in serum creatinine and other parameters, such as proteinuriaand levels of immunosuppressive drugs. Although, protocol-biopsies areconsidered the “gold standard”, their clinical utility is significantlylimited due to invasiveness, cost, inadequate sampling, and poorreproducibility. Serum creatinine, the current standard-of-care markerto screen renal allograft dysfunction and indicate when biopsy andhistological evaluation of renal tissue is warranted is a poor marker,due to its low sensitivity and specificity. (Sigdel et al., Optimizingdetection of kidney transplant injury by assessment of donor-derivedcell-free DNA by massively multiplex PCR, PLoS One, Manuscript inpreparation 2018). Moreover, creatinine is a lagging indicator of renalinjury; by the time serum creatinine levels increase, the allograft hasalready undergone severe and irreversible damage. Thus, an unmet medicalneed exists to non-invasively detect early onset of transplant rejectionand assist physicians make proactive decisions with regards to managingimmunosuppressive therapy and prevent graft injury and loss.

Donor-derived cell-free DNA (dd-cfDNA) can be detected noninvasively inthe plasma of transplant patients, and is a proven non-invasivebiomarker for kidney transplant rejection. The present disclosureprovides an assay that can estimate dd-cfDNA fraction in renaltransplant recipients by measuring allele frequency at 13,962 SNPs. Arecent clinical validation study demonstrated the ability of this methodto discriminate active rejection from non-rejection with a sensitivityof 88.7%, specificity of 73.2%, and AUC of 0.87 using a dd-cfDNAthreshold of 1% (Sigdel et al. 2018). Sigdel et al. 2018 showed asignificant difference in the dd-cfDNA levels in both antibody-mediatedrejection (ABMR) and T-cell mediated rejection (TCMR) case thannon-rejection cases, including those with stable allografts, borderlinerejection, and other injuries. The present disclosure analyticallyvalidated our clinical-grade NGS test by determining the limit of blank(LoB), lower limit of detection (LoD) and lower limit of quantification(LoQ), linearity, precision (reproducibility and repeatability) andaccuracy in measuring the fraction of dd-cfDNA in recipients of kidneytransplant.

Materials and Methods

The general workflow of this study is shown in FIG. 22.

Plasma Samples

Whole blood samples (20 mL) were collected from healthy volunteers(n=15) and transplant patients (n=6) in Cell-Free DNA BCT tubes (Streck,Omaha, Nebr.). Plasma (5-10 mL) was isolated from blood aftercentrifugation at 3220×g for 30 minutes at 22° C. and stored at −80° C.Cell-free DNA was extracted either using Applicant's in-house chemistryfor extraction (NICE) (San Carlos, Calif.) or QIAamp® CirculatingNucleic Acid Kit (Qiagen, Germatown, Md.).

Reference Samples (Cell-Line Derived)

Reference samples were procured from SeraCare Lifesciences (Milford,Mass.) and were developed by mixing genomic DNA (gDNA) from 5 differentcell lines to develop 3 binary female (recipient)/male (donor) mixtures;1 related and 2 unrelated, at specific percentages (0, 0.1, 0.3, 0.6,1.2, 2.4, 5, 10, and 15%) of donor fraction. The percentage of donorfraction in each mixture was verified by digital droplet PCR (ddPCR) bySeraCare. The gDNA mixtures were sheared by sonication and size selectedto mimic expected cfDNA fragments of 160 base pairs. Concentration ofthe reference samples was quantified using Quant-iT® or Qubit®(ThermoFisher, Carlsbad, Calif.) High-Sensitivity kits.

cfDNA Mixture Samples (Plasma-Derived)

cfDNA extracted from plasma of healthy volunteers (n=16) was used todevelop 3 unrelated and 6 related binary cfDNA mixtures. The 3 unrelatedmixtures were prepared at 7 different target dd-cfDNA levels (0.1, 0.3,0.6, 1.2, 2.4, 5, 10%). Out of the 6 related cfDNA mixtures, 4 weredeveloped at donor fractions: 0.1, 0.3, 0.6, 1.2%, and the remaining 2mixture samples were developed at donor fractions 0.3 and 0.6%.Concentration of cfDNA mixture samples was quantified using Quant-iT® orQubit® (ThermoFisher, City and State) High-Sensitivity kits.

Targeted Amplification, SNP Selection, Sequencing Data Analysis andQuality Control

Reference samples and extracted cfDNA mixture samples were used as inputfor library preparation followed by PCR amplification. Subsequently,targeted amplification was achieved by performing mmPCR as previouslydescribed in Ryan et al., Validation of an Enhanced Version of aSingle-Nucleotide Polymorphism-Based Noninvasive Prenatal Test forDetection of Fetal Aneuploidies, Fetal diagnosis and therapy,40(3):219-223 (2016), but with a different primer pool targeting 13,926SNP positions. The SNPs were designed for high variant allele frequencyacross different ethnicities. Bi-allelic SNPs were selected onchromosomes 2, 13, 18, 21, 22 and X, but only chromosomes 2, 13, 18 and21 were included in the donor fraction analysis. To ensure accuratedonor fraction estimate regardless of patient ethnicity, SNPs wererequired to have high minor allele frequency across the major ethnicgroups defined in the 1000 Genomes project (1000 genome). Specifically,at least 75% of SNPs were required to have minor allele frequencygreater than 25% in European, African, Asian and American ethnic groups.

The PCR amplicons achieved after targeted amplification were barcodedand combined to generate 32-plex pools, which were sequenced using NGStechnology (Illumina NextSeq 500 instrument, 50 cycles, single endreads). Sequenced reads were demultiplexed and mapped to the hg19reference genome using Novoalign version 2.3.4 (Website novocraft).Bases with Phred quality score <30 and reads with mapping quality score<30 were filtered. Multiple quality checks (QCs) (cluster density,mapping rate, etc.) were applied to the sequencing run and each samplewas confirmed to have the desired number of reads (8 million) afterfiltering. Any pool failing sequencing run QCs was re-sequenced. Anysample that failed to produce the necessary number of reads was removedfrom the analysis.

Percent Dd-cfDNA Calculation

For each sample, fraction of donor-derived cfDNA (donor fraction) wasestimated based on the minor allele frequencies measured for all SNPswhere the recipient was homozygous. The donor fraction calculation wasbased on a maximum likelihood estimate over a search range from 0.0001to 0.25 at increments of 0.0001. Our approach did not include a separatedonor sample, and donor genotypes were represented by a probabilitymodel that incorporated both population-based prior probabilities (1000genome) and the observed allele ratios. No heuristic adjustment wasneeded for related donors due to lack of in-built assumptions regardingfraction of genotype concordance between the recipient and the donor.Instead, the corresponding genotype inheritance constraints wereincorporated into the donor genotype probability model. This estimatemode was referred to as “related estimate” and the unconstrainedestimate was referred as “standard estimate”.

Experimental Plan and Statistical Analysis

To evaluate the analytical performance of the test, LoB, LoD, LoQ,linearity, precision, and accuracy were measured based on CLSIguidelines (EP-17A2, EP05-A3) as further described below. Tables 1A-1Bbelow shows the experimental design.

TABLE 1A Experimental design for determining LoB, LoD, LoQ and linearityParameters LoB LoD LoQ, Linearity Sample Ref. Healthy Ref. Healthy blooddonors Ref. Healthy blood donors samples blood Samples (n = 16) Samples(n = 14) (n = 5 donors blanks) (n = 15) Input 15, 30, 45 Variable 15,30, 45 15 15 Variable 15, 30, 45 15 15 Variable mass (ng) Sample N/A N/A3: 2 3: Unrelated 3: Related 3: Related 3: 2 3: Unrelated 3: Related 3:Related mixtures Unrelated, cfDNA cfDNA library Unrelated, 1 cfDNAlibrary 1 related mixes mixes mixes related mixes mixes Donor fractionsN/A N/A 0.1, 0.3, 0.6 0.1, 0.3, 0.6 0.1, 0.3, 0.3, 0.6 0.1, 0.3, 0.6,0.1, 0.3, 0.6, 0.1, 0.3, 0.3, 0.6 (%) 0.6 1.2, 2.4, 5, 10, 1.2, 2.4, 5,0.6, 1.2 15 10, Number 68 60 274 60 27 28 638 96 36 28 of MeasurementsTotal 128 389 798 Measurements

TABLE 1B Experimental design for determining accuracy, reproducibility,and repeatability. Accuracy Reproducibility Repeatability Sample Ref.Ref. Transplant Ref. Samples Samples Samples Samples (n = 6) Input 15,30, 45 15, 30, 45 Variable 30 mass (ng) Sample 3: 2 Unrelated, 3: 2 N/A1: Related mixtures 1 related Unrelated, 1 related Donor 0.1, 0.3, 0.6,0.1, 0.3, 0.6, Variable 0.6, 2.4 fractions (%) 1.2, 2.4, 5, 10, 1.2,2.4, 5, 15 10 Number 638 504 12 128 of Measurements Total 638 516 128Measurements

Limit of Blank

Limit of Blank (LoB) was established using 1) reference samples (blanksor single genome), developed from sheared gDNA of 5 different pure celllines, obtained from SeraCare, and 2) plasma-derived cfDNA samples(n=15) collected from healthy blood donors who never had a transplant orrecent blood transfusion. For reference samples, each of the pure celllines were tested at 3 different library input amounts (15, 30, and 45ng) to mimic the expected cfDNA yield achieved from 20 mL bloodcollections. However, for the plasma-derived cfDNA samples, the inputamounts were kept variable for library prep to simulate input variationin real samples. In compliance with CLSI guidelines, samples were testedin triplicates on 3 different days with 2 different sequencing reagentlots that consisted of at least 60 measurements per lot for a total of128 blank measurements.

LoB is defined as the empirical 95th percentile value measured from aset of blank (no-analyte) samples. The calculation was performed twicefor the cell-line derived reference samples (once for each reagent lot),and again for the plasma-derived cfDNA. The plasma-derived cfDNA samplesincluded fewer replicates than recommended by the CLSI guidelines andwere used for consistency check only. The final LoB is the maximum ofthe lot 1 LoB and the lot 2 LoB. All calculations were performed onceusing the standard donor fraction estimate and once using the relateddonor fraction estimate in order to measure the corresponding LoB s forthe two estimate methods.

Limit of Detection and Limit of Quantification

Limit of Detection (LoD) and Limit of Quantification (LoQ) were measuredusing both cell-line derived reference samples from SeraCare andplasma-derived cfDNA mixtures from healthy volunteers. The referencesamples were tested at 3 different cfDNA input amounts (15, 30 and 45ng). LoD was measured at the three lowest donor fraction levels (0.1,0.3, 0.6%), in 6 replicates by 2 operators on different days usingdifferent reagents lots, and sequencing instruments. Plasma-derivedcfDNA mixtures were tested at 15 ng input, for both unrelated andrelated mixtures. The three unrelated cfDNA mixtures were tested at 3lowest donor fraction levels (0.1, 0.3, 0.6%) in 6 replicates. Among 6related cfDNA mixtures, three were tested at the 3 lowest donor fractionlevels (0.1, 0.3, 0.6%) in triplicate and the remaining three(mother-son) were tested at 2 donor fraction levels (0.3, 0.6%),processed in duplicates. LoQ analysis included all the samples used forLoD as well as a corresponding set of replicates at higher donorfractions (1.2, 2.4, 5, 10, 15% for cell lines and 1.2, 2.4, 5, 10% forplasma-derived cfDNA).

LoD is calculated following the parametric estimate method specified inEP-17A2, which computes LoD by adding a standard deviation term to theLoB. The standard deviation term consists of the pooled standarddeviation (estimated from the set of replicates described in the LoD),multiplied by a correction factor specified based on the number ofsamples. LoD is calculated for each input mass and donor fractionestimate method, by combining the corresponding LoB with thecorresponding standard deviation measurement.

An appropriate LoQ assessment was selected based on the quantificationrequirements of the test process. LoQ is defined as the lowest value ofdonor fraction at which sufficient relative measurement precision isachieved, lower bounded by the LoD. Sufficient relative measurementprecision was defined as 20% coefficient of variation (CV), and CV wasdefined as the measurement standard deviation divided by the mean. CV ofdonor fraction was observed to depend on the donor fraction (d) with therelationship CV=a +b*exp(−c*d), where the model parameters a, b and care estimated from the data using a non-linear least squares procedure.The CV model (described by parameters a, b, c) was estimated for eachinput mass and donor fraction estimate method, and the corresponding LoQwas the lowest value for which the model satisfies the CV requirementwith LoD as the lowest possible LoQ. This model-based approach requiresinclusion of higher donor fraction measurements for the LoQ assessmentin order to ensure convergence to an appropriate constant value at highdonor fraction.

Linearity and Accuracy

Linearity was measured using cell-line derived reference samples atcfDNA input amounts (15, 30, 45 ng) at all manufactured donor fractionslevels (0.1, 0.3, 0.6, 1.2, 2.4, 5, 10, 15%) by 2 operators on differentdays using different reagent lots, and sequencing instruments. All sevendonor fractions (0.1, 0.3, 0.6, 1.2, 2.4, 5, 10%) of unrelatedplasma-derived cfDNA mixture samples at 15 ng input were used to comparelinearity with the cell line derived data. For the plasma-derived cfDNAmixtures, 6 replicates of the 3 lowest donor fractions (0.1, 0.3, 0.6%)and triplicates of 4 high donor fractions (1.2, 2.4, 5, 10%) wereassayed. In order to evaluate the accuracy, or trueness, of thetransplant test, SeraCare reference mixtures at 8 donor fractions up towere 15% were used at 15, 30 and 45 ng input.

Linearity was evaluated based on the R² value produced by a standardlinear regression analysis of the relationship between measured donorfraction and targeted mixture fractions. Accuracy was evaluated based onthe linear regression analysis of the relationship between measureddonor fraction and the orthogonal ddPCR measurement.

Precision

Precision was measured by testing reproducibility (inter-run) andrepeatability (intra-run) across 632 reference samples. To assessinter-run reproducibility, 3 SeraCare donor-recipient mixtures (0.1,0.3, 0.6, 1.2, 2.4, 5, 10%) were tested with replicates at 15, 30, 45 nginput. Repeatability was determined by measuring variability betweentechnical replicates of samples measured under similar conditions. Onerelated (mother-son) SeraCare reference mixture at 0.6 and 2.4% donorfractions was assayed by a single operator, reagent lot, and instrumentfor a total of 128 measurements. In addition to cfDNA mixtures, matchedblood draws (4 tubes/patient) from transplant recipients were run induplicates and evaluated for reproducibility in clinical samples.Samples were processed by 2 different operators on 8 different days (24runs across 23 days) with 3 reagent lots and 17 sequencing instruments.

Repeatability was defined as the coefficient of variation (CV) measuredacross the set of replicates at a single targeted donor fraction, undermatched conditions. Thus, CV was calculated once at 0.6% donor fractionand once at 2.4%. Reproducibility was also measured using CV, calculatedseparately for each combination of DNA input amount and mixturefraction.

Results

LoB was calculated using 64 measurements for each of two reagent lots.The LoB was 0.11% using the unrelated donor estimate and 0.23% using therelated donor method. Evaluation of plasma-derived cfDNA measurementsonly (combined across both lots) resulted in LoB 0.04% (unrelated) and0.08% (related), suggesting that the LoB in patient samples may beequivalent or superior to that measured using reference samples,although the sample size is limited (60 measurements). There was nosignificant difference between DNA input amounts. FIG. 23 showshistograms of the relevant donor fraction measurements broken down bymethod and lot.

LoD was calculated from 168 unrelated and 220 related measurements,resulting in LoD of 0.15% (unrelated) and 0.29% (unrelated). Thesenumbers exclude one sample that failed QC due to insufficient number ofreads. Note that the difference in LoD for related versus unrelateddonors was approximately equal to the difference in corresponding LoB,meaning that the measurement variance near the LoD was approximately thesame in the two methods. There was no significant impact due to DNAinput amount. Following an approach similar to the one taken for LoBanalysis, restriction to plasma-derived cfDNA measurements resulted inlower estimated LoD: 0.05% (unrelated) and 0.11% (related), although thenumber of measurements was less than ideal (54 related, 60 unrelated).

LoQ was calculated from 381 unrelated and 412 related measurements,after exclusion of 5 samples due to insufficient number of reads.Empirical CVs were calculated in the set of sample replicates at eachtargeted donor fraction and they were all less than 20%, including cellline-derived and plasma-derived cfDNA. Parametric models were fit foreach reagent lot, once for related mixtures and once for unrelated.Empirical CVs and the resulting parametric models are shown in FIG. 24.The modeled CVs were also less than 20% for all donor fractions greaterthan or equal to the LoD. Thus, the LoQ was equal to the LoD for allscenarios.

LoB Analysis:

Tables 2-4 below summarize the mean, median, and standard deviationvalues of the measured donor fractions for each lot and mode of thetest.

TABLE 2 Mean values of measured donor fractions for related andunrelated cases for Lots 1 and 2. Mean Donor Fraction Lot 1 Lot 2Related 0.03% 0.06% Unrelated 0.02% 0.03%

TABLE 3 Median values of measure donor fractions for related andunrelated cases for Lots 1 and2. Median Donor Fraction Lot 1 Lot 2Related 0.01% 0.03% Unrelated 0.01% 0.01%

TABLE 4 Standard deviation values of measured donor fractions forrelated and unrelated cases for Lots 1 and 2. Std. Dev. Donor FractionLot 1 Lot 2 Related 0.05%  0.1% Unrelated 0.02% 0.05%

To demonstrate the performance of the test for gDNA and cfDNA samplesseparately, we computed LoB for each case by using 60 (resp. 68)measurements coming from cfDNA (resp. gDNA) samples. To increase thesample size, we did not distinguish between the lots. Histograms showing(resp. LoB values) for each DNA type and mode of the test are depictedin FIG. 29 (resp. Table 5).

TABLE 5 LOB values for related and unrelated modes of the test gDNA andcfDNA samples LoB gDNA cfDNA Related 0.23% 0.08% Unrelated 0.11% 0.04%

LoD Analysis:

The parametric LoD computation method necessitated that: (i) themeasurements from low-level samples (approximately) followed a Gaussiandistribution, and (ii) the empirical standard deviations of the saidsamples (approximately) remained constant as a function of empiricalmean. Histograms of centered, measured donor fractions for each lot andeach test mode are shown in FIG. 30. The empirical standard deviation asa function of empirical mean for both lots and test modes is shown inFIG. 31. The data disclosed in FIGS. 30 and 31 demonstrated that thesetwo conditions are satisfied for both related and unrelated low-levelsamples.

To demonstrate LoD for gDNA and cfDNA samples separately, as well as tosee the effect of input amount for gDNA samples, the above outlined LoDanalysis was performed for these sets of samples separately, by usingthe corresponding LoB values for each case. Specifically, 54 related and60 unrelated measurements were used for cfDNA case. Further, for gDNAcase, 18 related, 36 unrelated measurements were used for 15 ng and 45ng inputs; and 130 related, 36 unrelated measurements were used for 30ng input. The computed LoD values with respect to test mode and inputamount for gDNA samples are shown in Table 6 below, whereas the LoDvalues for two different test modes for cfDNA samples are shown in Table7 below.

TABLE 6 LoD values for related and unrelated modes of the test for 15,30, 45 ng inputs for gDNA samples. gDNA-LoD 15 ng 30 ng 45 ng Related0.28% 0.26% 0.25% Unrelated 0.13% 0.13% 0.12%

TABLE 7 LoD values for related and unrelated modes of the test for cfDNAsamples. cfDNA LoD Related 0.11% Unrelated 0.05%

LoQ Analysis:

Similar to LoD analysis, we evaluated LoQ numbers for gDNA samples,which were further partitioned with respect to their input amounts. Asdepicted in FIG. 32, all the measured CV values for all the spike levelstested were below 20% cutoff for related samples at all input levels, aswell as related samples at 15 and 45 ng input levels. Thus, for allthese cases, lower LoQ was equal to LoD, by definition. For relatedsamples with 30 ng input level, fitted curve intersected with 20% CVlevel at approx. 0.174%, which was lower than the corresponding LoD,i.e., 0.26%, for this case. Thus, the lower LoQ was again equal to LoD,by definition. Further, we also computed LoQ values for cfDNA samples,as depicted in FIG. 33. Clearly, for both cases, we had lower LoQ equalto the corresponding LoD. The estimated parameters of the non-linear fitfor

CV for every scenario we report LoQ values is shown in Table 8 below.

TABLE 8 Estimated parameters of the exponential decaying model of the CVfor every scenario we report LoQ values Data Set a b c cfDNA + gDNA,0.950216 16.4685 1.88562 Related, Lot 1 cfDNA + gDNA, 1.82651 24.29486.82745 Related, Lot 2 cfDNA + gDNA, 0.557873 6.16417 1.53284 Unrelated,Lot 1 cfDNA + gDNA, 0.715364 7.00144 1.00344 Unrelated, Lot 2 gDNA,Related, 15 ng 0.907757 18.3994 1.97114 gDNA, Related, 30 ng 0.79889245.2805 4.943 gDNA, Related, 45 ng 0.746606 7.69009 2.62489 gDNA,Unrelated, 15 ng 1.06598 14.2357 6.04647 gDNA, Unrelated, 30 ng 1.436217.0526 7.3715 gDNA, Unrelated, 45 ng 0.801393 12.0185 5.69333 cfDNA,Related 1.88546 13275.4 53.5112 cfDNA, Unrelated 0.654995 10.19716.67823

Linearity, Accuracy and Precision

Linearity was measured from 381 unrelated and 412 related samples, afterremoval of 5 samples that failed QC due to insufficient number of reads.Accuracy was measured from the subset of these (the cell line-derivedreference samples) for which ddPCR donor fraction was available as areference: 285 unrelated and 349 related, after exclusion of 4 samplesdue to insufficient number of reads. The individual measurements andlinear regression lines are shown in FIG. 25 (linearity) and FIG. 26(accuracy). Linearity was measured by linear regression against thetargeted donor fraction, and accuracy was measured by linear regressionagainst the ddPCR-measured donor fraction. The linear regression resultsare shown in Tables 9 and 10 below. The donor fraction measurement wasshown to be highly linear (R² greater than 0.99 in all models) andaccurate (slope approximately 1, intercept approximately zero). Therewas no significant difference between related and unrelated donors asdetermined by combined regression.

TABLE 9 Linear regression results for linearity and accuracy, including95% confidence interval. slope intercept R² accuracy, 1.0591 0.00010.9988 combined (0.9763, 1.1418) (−0.0045, 0.0047) (0.9987, 0.9990)accuracy, 1.0333 −0.0001  0.9959 related (0.9241, 1.1425) (−0.0047,0.0046) (0.9986, 0.9990) accuracy, 1.0664 0.0008 0.9997 unrelated(0.9416. 1.1912) (−0.0076, 0.0092) (0.9997. 0.9998) linearity, 1.05160.0004 0.9968 combined (0.9781, 1.1251) (−0.0033, 0.0042) (0.9964,0.9972) linearity, 0.9852 0.0008 0.9991 related (0.8895, 1.0809)(−0.003 1. 0.0047) (0.9989, 0.9992) linearity, 1.0813 0.0006 0.9995unrelated (0.9721, 1.1906) (−0.0060, 0.0071) (0.9994, 0.9996)

TABLE 10 Linear regression results for linearity, including 95%confidence intervals, for clinical samples. slope intercept R² 1.0125−0.0002 0.9998 (−0.3932, 2.4153) (−0.0121, 0.0117) (0.9984, 1.0000)

The precision of the herein disclosed methods were evaluated bymeasuring repeatability within a single experiment run and set ofconditions and reproducibility across a varied set of conditions.Repeatability was measured using CV at two targeted donor fractions(0.6% and 2.4%), each using 64 cell line-derived sample measurementswith no samples removed due to QC failure. The CV was 1.85% (95% CI:1.34%-2.73%) at 0.6% targeted donor fraction, and CV was 1.22% (95% CI:0.88%-1.80%) at 2.4% targeted donor fraction. Per input reproducibilitywas calculated by using 498 measurements, after removal of 6 samplesthat failed QC due to insufficient number of reads. For 15 ng input, theCV was 3.10% (95% CI: 1.58%-4.37%); for 30 ng input, the CV was 3.07%(95% CI: 1.42%-4.50%); for 45 ng the CV was 1.99% (95% CI: 1.10%-2.75%).Per lot reproducibility was calculated from a subset of theaforementioned samples, whose cardinality is 374, which excludes 4samples that failed QC due to insufficient number of samples. The CV forLot 1 was 3.99% (95% CI: 2.42%-5.41%) and the CV for Lot 2 was 4.44%(95% CI: 2.69%-6.02%).

We also evaluated linearity and precision of the test for clinicaltransplant samples, in line with the aforementioned analysis. To thisend, 12 measurements, none of which failed due to QC, were used.Linearity was measured by performing linear regression of the measureddonor fraction from Lot 2 against Lot 1. The measurements and linearregression lines are shown in FIG. 27, and the corresponding linearregression results are provided. The estimated precision of the testedwas determined to have a CV of 4.29% (95% CI: 0.65%-6.86%). Finally, weobserved 100% concordance (95% CI: 54.07%-100%) between replicates.

Accuracy Analysis:

To demonstrate the accuracy for cfDNA samples, we used donor fractionestimated by using SNP's from HNR in lieu of ddPCR for gDNA. Therationale of using this method as a more precise alternative to theconventional donor fraction estimate using non-HNR SNP's was due to thefollowing: since HNR were non-recombining, and the cfDNA samples weredesigned to have a female background with male spike-in, the Ychromosome allele measurements were directly attributable to the donorsignal. The accuracy analysis was carried out by using 63 related and 96unrelated cfDNA measurements, which excluded one sample that failed QCdue to insufficient number of reads. The individual measurements andlinear regression lines are shown in FIG. 34, and the correspondinglinear regression results are shown in Table 11 below. It should benoted that relatively wider confidence intervals for cfDNA estimatescompared with their gDNA counterparts is probably a result of therelatively smaller sample size of the former compared with the latter.

TABLE 11 Linear regression results for accuracy of cfIDNA samples,including 95% confidence intervals. cfDNA- accuracy slope intercept R²Unrelated 1.0108 0.0002 0.9996 (0.8038, 1.2179) (−0.0076, 0.0080)(0.9994, 0.9997) Related 1.0440 0.0007 0.9706 (0.7727, 1.3153) (−0.0012,0.0027) (0.9517, 0.9993) Combined 1.0073 0.0005 0.9991 (0.8484, 1.1662)(−0.0042, 0.0053) (0.9987, 0.9993)

Linearity Analysis:

Similar to previous performance metrics, we broke down the linearityanalysis for gDNA and cfDNA samples separately. Specifically, for gDNAanalysis, 349 related and 285 unrelated measurements were used, and forcfDNA analysis, 63 related and 96 unrelated measurements were used. Theindividual measurements and linear regression lines (resp. individualmeasurements on a log-log scale) for gDNA samples are shown in FIG. 35(resp. FIG. 36). Similarly, the individual measurements and linearregression lines (resp. individual measurements on a log-log scale) forcfDNA samples are shown in FIG. 37 (resp. FIG. 38) depicts Tables 12 and13 contain corresponding linear regression results for gDNA and cfDNA,respectively.

TABLE 12 Linear regression results for accuracy of gDNA samples,including 95% confidence intervals. gDNA- linearity slope intercept R²Unrelated 1.0804 0.0007 0.99989 (0.9540, 1.2069) (−0.0077, 0.0091)(0.99986, 0.99992) Related 0.9876 0.0005 0.9994 (0.8833, 1.0920)(−0.0041, 0.0052) (0.9974, 0.9995) Combined 1.0515 0.0003 0.9969(0.9693, 1.1338) (−0.0043, 0.0049) (0.9964, 0.9974)

TABLE 13 Linear reg cfDNA- linearity slope intercept R² Unrelated 1.07870.0002 0.9962 (0.8574, 1.300) (−0.0076, 0.0080) (0.9943, 0.9975) Related1.3368 0.0001 0.9713 (0.9895, 1.6841) (−0.0020, 0.0022) (0.9528, 0.9965)Combined 1.0734 0.0008 0.9953 (0.9038, 1.2430) (−0.0039, 0.0055)(0.9935, 0.9965)

Repeatability and Reproducibility Analysis:

To compute the confidence intervals on the estimated CV's forrepeatability analysis, we used the classical bounds as described inMcKay, “Distribution of the coefficient of variation and the extended tdistribution.” Journal of the Royal Statistics Society, 95(4): 695-698(1932), based on a chi-squared approximation. The derivation of thesebounds assumes that the underlying measurements from which CV isestimated are realizations from Gaussian distributions, Histograms inFIG. 39 verified that said assumption is justified in our case.

It should be noted that chi-squared approximation-based bounds used inthe repeatability analysis is not suitable to compute the confidenceintervals of the estimated CV's for reproducibility analysis because theunderlying measurements from which CV value is estimated do not follow aGaussian distribution, due to the broad range of underlying donorfractions, Thus, we computed confidence intervals by a standardbootstrapping technique. Because of the inherent stochasticity of theapproach, the particular values may slightly vary for each trial of themethod, Confidence intervals of the estimated concordance betweenclinical samples was computed via Clopper-Pearson method for binomialproportions. Specifically, we used the closed-form expression of thesaid method for 100% observed success rate.

Discussion

Kidney transplantation, pioneered in 1954 at Brigham hospital, hasresulted in a dramatic improvement in the quality of life for patientswith kidney failure. Introduction of several generations ofimmunosuppressive treatments has brought down the rejection rate,however, it remains unacceptably high at about 5% per year, with morethan half of allografts failing by year ten. Early detection ofrejection in kidney transplant recipients holds the promise to improvethis further but remains an unmet need due to non-availability ofsensitive and non-invasive diagnostic kits. To diagnose acute renaltransplant rejection, measurement of renal filtration function is mostcommonly recommended through a serum creatinine test. Although the serumcreatinine test is an inexpensive test for transplant rejection,detecting transplant rejection by measuring serum creatinine hasphysiologic limitations and is highly imprecise. The most definitivediagnosis of renal allograft dysfunction thus relies on thehistopathological evaluation of a percutaneous ultrasound-guided biopsy,which is invasive and can lead to major/minor complications such asbleeding. In addition, the interobserver variability impedes thereliability of biopsies. Given the existing limitations with currentmethods, there remain a medical need for improved methods for detectingtransplant rejection that are non-invasive, inexpensive, sensitive,specific, and have a rapid turnaround. The present disclosure provided astrong case for dd-cfDNA, as a biomarker to monitor health of renaltransplant that fulfilled this need.

The present disclosure addressed the analytical validity of the donorfraction quantification method used in Sigdel et al. 2018. The clinicalinterpretation described in Sigdel et al. 2018 classified a patient ashaving increased risk of organ rejection when the donor fraction isgreater than 1%. Thus, the analytical performance described hereinshould be interpreted in the context of accurately classifying a samplewith respect to that threshold. From that perspective, we observe thatthe LoD and LoQ are 0.15% for unrelated donors and 0.29% for relateddonors based on an LoQ definition of 20% CV, implying ability toaccurately quantify donor fraction at a level significantly lower thanthe classification threshold. These measurements were based on cellline-derived reference samples and performance was estimated to beequivalent or superior using a smaller number of plasma-derived cfDNAsamples. Similarly, the method was confirmed to have high accuracy basedon linear regression with respect to an orthogonal measurement, withlinear regression parameter confidence intervals including slope equalto one and intercept equal to zero, based on 349 related and 285unrelated measurements. Performance was evaluated with respect to arange of DNA input mass, which did not drive any consistently detectableperformance difference over the tested range from 15 ng to 45 ng.Precision studies showed that the donor fraction measurement was stableacross in-run and cross-run replicates, across multiple lots of criticalreagents, and between repeat (concurrent) blood draws from the samepatient. Accordingly, this study indicated that the test was appropriatefor clinical implementation.

The present study was designed to assess performance independently inrelated versus independent donors due to concern that the higher rate ofgenotype concordance (implying lower rate of informative genotypes) in arelated donor scenario might limit the accuracy of the donor fractionestimate. This was tested using a large number of replicates of amother-child cell line-derived donor pair along with smaller number ofreplicates from plasma-derived DNA from other subject pairs withrelationships including siblings and lesser degree of relatedness. Weobserved that LoB was higher in related donor pairs, which led tocorrespondingly higher LoD. However, all of the other metrics includinglinearity and the various precision metrics were equivalent betweenrelated and unrelated donor pairs, showing that the quantitativeperformance of the test was not meaningfully impacted by the reducednumber of informative genotypes, based on confirmation from a variety ofcontrived samples. This statistical approach was also superior to aprobability-based approach for modeling donor genotypes, because thestatistical approach does not have to make any assumptions about thefraction of SNPs in which the donor has one allele versus two allelesdifferent from the recipient.

Multiple ongoing registry studies are expected to demonstrate clinicalutility for the dd-cfDNA assay, for example, it is expected to lead tomore effective use of biopsy. As dd-cfDNA is a marker indicating ongoingallograft injury, as opposed to creatinine which is a lagging indicatorshowing decreased functioning, it is expected to lead to earlierdetection of kidney rejection. Earlier detection allows more rapidintervention in the case of rejection, possibly leading to lower de novoDSA levels, less allograft damage, and improved graft survival rates.Additionally, it may give nephrologists a tool that would allow them tobetter optimize immunosuppressive regimens, with the goal of minimizingimmunosuppressant-related toxicity without an increase in the rate ofrejection.

Example 4. KidneyScan

Introduction

With 20-30% of transplanted kidneys failing within five years and only55% survive to ten years, the limitations of current standard-of-carefor monitoring renal allograft rejection are severe and costly. The costassociated with a failed renal transplant patient may be 500% more thana patient with a functioning transplant. As such, there is a clear needfor timely, sensitive, specific, non-invasive diagnostic tools toimprove kidney transplant management. Applicant has created an assay,KidneyScan, that aids physicians in detecting rejection events earlier,avoiding unnecessary biopsies and more safely optimizingimmunosuppression levels to increase kidney graft survival rates.

KidneyScan is a non-invasive blood test validated for first-time kidneyallograft recipients >18 years of age at a minimum of two weekspost-transplant across ethnicities. The assay is to be used uponphysician-assessed pretest to further assess the probability of activerenal allograft rejection. A step before new biopsy, KidneyScan may helpappropriately rule in rejection when patient otherwise appears stableand suspicion is unclear; or appropriately rule out rejection whenpatient presents with a clinical risk of rejection.

The single-nucleotide polymorphism (SNP)-based massively multiplexed PCR(mmPCR) assay targets 13,926 SNPs to accurately detect allograftrejection/injury without the need for donor genotypes. Leveraging aproven biomarker and an established methodology, the SNP-based dd-cfDNAassay identifies active rejection by measuring the fraction of donorderived cell-free DNA (dd-cfDNA) in the patient's blood, a mixture ofdonor and recipient cell-free DNA. Because cells release dd-cfDNA upongraft injury or death, a higher dd-cfDNA fraction indicates a higherlikelihood of active rejection.

In a recent blinded, large scale prospective study of 217 biopsy-matchedrenal allograft samples, a retrospective analysis of the SNP-baseddd-cfDNA assay demonstrated superior accuracy in detecting activerejection over current standard of care (eGFR and serum creatinine) withhigh sensitivity (88.7% vs. 67.7% vs. 51.6%), specificity (72.6% vs.65.3% vs. 67.5%) and AUC (0.87 vs. 0.74 vs. 0.68). Additionally, theSNP-based dd-cfDNA assay distinguished acute rejection from eachnon-rejection (borderline injury, other injury and stable) significantlybetter than eGFR (P<0.0001 for each). These findings establisheddd-cfDNA as an earlier, more accurate biomarker for active rejectionthan the standard-of-care that can be used prior to the deterioration ofrenal function. Acknowledged by the KDIGO guidelines, “detecting kidneyallograft dysfunction as soon as possible will allow timely diagnosisand treatment”.

Furthermore, the SNP-based dd-cfDNA assay accurately identified a broaddistribution of rejection types (antibody-mediated rejection, T-cellmediated rejection and combination) from non-rejection at the predefinedcut-off of >1%. This distribution encompasses the leading causes ofallograft failure occurring in 20-25% of patients in the first 12-24months and are missed through current standard of care tools.Incorporating the SNP-based dd-cfDNA assay into transplant assessmentprotocols may lead to timely detection of rejection and earlier tailoredimmunosuppression treatments. KidneyScan offers physicians the clinicaladvantage of identifying active rejection (including subclinical)earlier, comprehensively and non-invasively to ultimately improve thecare of kidney transplant patients.

Background

Chronic kidney disease (CKD), a worldwide health burden, affects 10% ofglobal population and results in adverse outcomes such as kidneyfailure, cardiovascular disease and premature death. Approximately, 15%(30 million) of adults in the United States are estimated to have CKDwith close to 1 million persons having end stage renal disease (ESRD).Lifestyle diseases such as diabetes, atherosclerosis and hypertensionrelated to the aging society have led to an increased prevalence ofESRD.

Kidney Transplantation

Kidney transplantation is a preferred treatment for ESRD and isassociated with lower morbidity, mortality, improved quality of life andis cost effective when compared to renal replacement therapy. However,according to the 2018 annual report published by United States of RenalSystem, in 2016, 70.1% of patients with ESRD were being treated withdialysis, and only 29.6% had a functioning kidney transplant. The annualUS Medicare spending for combined for CKD and ESRD exceeded $114 billionwith a yearly per-patient cost of approximately $90,971 for hemodialysisand $34,780 for kidney transplant in 2016. At present, more than 19,000kidney transplants are performed in the US annually, resulting inapproximately 200,000 patients living with a functioning kidney.

Challenges and Unmet Need

Although kidney transplantation is a treatment of choice over dialysis,it poses a unique set of challenges, wherein the patient is maintainedon life-long immunosuppressive regimens. Approximately 20-30% ofpatients, posttransplant, experience overall renal graft failure withinfirst 5 years and only 55% of transplanted kidneys survive to 10 years.Kidney allograft rejection diagnosed pathologically is categorized intoT-cell and antibody mediated rejection (TCMR/ABMR), based on Banff 2013schema. Therapeutic strategies focusing on improving graft survivaloutcomes primarily relate to the reduction in the incidence andconsequence of TCMR but not ABMR. Despite advances in immunosuppressiontherapies and desensitization techniques, the long-term graft survivaldepends on ABO or Human leukocyte antigens (HLA) compatibility, with thelatter being identified as a significant risk factor for developingABMR, ultimately leading to allograft loss. ABMR is a continuous processthat can occur at different time points, leading to acute and chronicdamage. With the advances in therapeutic strategies, acute renaldysfunction can be reversed but cannot eliminate donor specific anti-HLAantibodies being secreted from plasma cells, originating from spleen andbone marrow that lead to a slowly progressive form of ABMR, referred assubclinical ABMR that can only be diagnosed through protocol biopsies.Another major factor that impacts the long-term allograft health oftransplant recipients are a variety of viral infections such as,cytomegalovirus, Epstein Barr virus, or BK virus, which are caused bychronic immunosuppression. With the above-mentioned clinical challenges,it is evident that a need exists for an efficient posttransplant,standard-of-care that can bring precision medicine with personalizedtailoring of immunosuppressive drug regimens in order to improve themanagement of kidney transplant.

Current Standard-of-Care and Limitations

Current standard-of-care options to monitor kidney health in transplantrecipients include protocol (or surveillance) biopsies as well asassessing dynamic changes in serum creatinine and other parameters, suchas proteinuria and levels of immunosuppressive drugs. Although protocolbiopsies are considered the “gold standard”, their clinical utility issignificantly limited due to invasiveness, cost, inadequate sampling,and poor reproducibility. In addition, protocol biopsies may becontraindicated in patients with uncontrolled hypertension, renalvascular anomalies, anticoagulant use and acute pyelonephritis. In orderto diagnose acute renal transplant rejection, measurement of renalfiltration function is most commonly recommended through a serumcreatinine test or its algorithmic derivative: estimated glomerularfiltration rate (eGFR). Although inexpensive, serum creatinine is highlyimprecise due to its low sensitivity and specificity and has physiologiclimitations (it is influenced by diet, muscle mass, medications such astrimethoprim and cimetidine, and new/recurrence of a disease). Moreover,creatinine is a lagging indicator of renal injury; by the time serumcreatinine levels increase, the allograft has already undergone severeand irreversible damage.

The limitations of current standard-of-care expose an unmet need for arapid, accurate, and noninvasive approach to detect allograft rejectionand/or injury, which may require integration of the current “gold”standard morphological assessments with modern molecular diagnostictools.

Donor-Derived Cell-Free DNA (Dd-cfDNA)—Noninvasive Biomarker

Donor-derived cell-free DNA (dd-cfDNA), detectable noninvasively in theplasma of transplant patients, is a proven non-invasive biomarker forkidney transplant rejection, and holds promise for producing faster andmore quantitative results compared with current treatment options. Witha short half-life of cfDNA (<1 hr) in blood, it provides an opportunityfor rapid, dynamic assessment and potentially early diagnosis ofallograft health. Specifically, it has the potential to improve the useof protocol biopsies, i.e. reduce unnecessary biopsy. In addition,propose a possible need for a biopsy in patients with subclinicalrejection who appear to be clinically stable thereby, facilitatingpersonalization of treatment regimens for an optimal outcome.

Applicant has an established, long-standing expertise in dd-cfDNA domainranging from reproductive health to oncology, and are looking forward toapplying this technology to help nephrologists better care for theirrenal transplant patients. The following sections present a detailedoverview of KidneyScan, Applicant's SNP-based dd-cfDNA technology,followed by a description of our analytical and clinical validation ofthe assay.

Sample Processing and Sequencing

Applicant's transplant test measures the fraction of donor derivedcell-free DNA (ddcfDNA) in total cell-free DNA (cfDNA) derived fromblood plasma of transplant patients. The method is described in and hassince included minor updates for compatibility with Applicant's CLIAlab, such as changing from HiSeq to NextSeq sequencer. The plasmaworkflow includes cfDNA extraction using Applicant's in-houseproprietary chemistry, library amplification, and amplification of a setof single nucleotide polymorphism (SNP) loci using targeted massivelymultiplex PCR. Donor fraction is estimated using thousands of SNPslocated on chromosomes 2, 13, 18 and 21. The SNPs were selected for highminor allele frequency across multiple ethnicities based on a largereference dataset. High throughput sequencing is performed on IlluminaNextSeq, followed by demultiplexing and mapping to the human referencegenome. The donor fraction estimate is based on the allele ratiosobserved at the targeted SNP locations.

Donor Fraction Calculation

Donor fraction is calculated from the set of SNPs where the recipient ishomozygous, with genotype either RR (homozygous reference allele) or MM(homozygous mutant allele). The general principle is that when therecipient has genotype RR and the donor has genotype RM, the observedfraction of M allele corresponds to half the donor fraction. When therecipient has genotype RR and the donor has genotype MM, the observedfraction of M allele corresponds to the full donor fraction. When therecipient and host both have genotype RR the SNP does not inform theestimate. The set of genotype combinations where the recipient is MM isinterpreted in the same way.

The mathematical approach is a maximum likelihood estimate over a fixedsearch range, combining data from recipient-homozygous SNPs. The datalikelihood is calculated for each candidate donor fraction and the donorfraction estimate is the candidate value that produces the maximum datalikelihood. This can be interpreted as choosing the candidate valuewhich best explains the observed sequencing data according to amathematical model. Donor genotype estimates are incorporated into thedata likelihood calculation based on their prior (population-based)probabilities and the observed data. This method does not require anyheuristic adjustment factors for varying degrees of recipient-donorrelationship. However, when there is a relationship (indicated on thetest requisition form) we constrain the genotype prior probabilities toreflect the required genotype concordance.

Summary of Analytical Performance

Analytical performance was assessed according to CLSI guidelines. All ofthe analytical performance results including accuracy, limit ofquantitation and precision were satisfactory in the context of theproposed clinical use. We highlight two important findings from ouranalytical performance studies: (1) Performance of the assay in relatedand unrelated donor/recipient pairs given that 10-15% of renalallografts involve highly related individuals; (2) Comparison ofperformance to another commercially available dd-cfDNA assay which hasreceived a positive limited coverage decision (LCD) from MolDx

Analytical Performance in Related VS Unrelated Samples

SNP-based donor fraction estimates depend on differences between therecipient and host genotypes. Variations from the expected rates ofdifferent host-vs-donor genotype pairs may impact the accuracy of theestimate and methods using insufficient number of SNP measurements areespecially susceptible to these risks. KidneyScan's use of probabilisticgenotype modeling combined with thousands of SNP measurements enablesequivalent performance in related versus unrelated donor scenarios,confirmed by testing on mixture samples created from relatedindividuals. KidneyScan achieves equivalent accuracy and precision forrelated versus unrelated donors; the only difference in performance isin the LoB, leading to corresponding (minimal) difference in LoD andLoQ, which all remain far from the classification threshold.

Comparison of KidneyScan Analytical Performance to Other Dd-cfDNA Assays

KidneyScan analytical performance was compared to a commerciallyavailable dd-cfDNA assay with analytical performance described inGrskovic et al, 2016. FIG. 40 shows similarly high quality accuracyassessment data for the two assays, both comparing to digital dropletPCR as the orthogonal reference measurement. KidneyScan is accurate withrespect to the reference measurement, indicated by a linear fit withslope approximately 1, intercept approximately zero, and R-squaredapproximately one.

KidneyScan has analytical limits (LoB, LoD, LoQ) similar to thepreviously published assay. Additionally, none of the relevant limitsare close to the classification threshold at 1%, which implies that theywill not limit clinical accuracy. Additionally, KidneyScan has betterrepeatability (5 fold) and inter-run precision (2.3 fold) as measured byCV near the classification threshold. Table 14 shows performance forunrelated donors because the Grskovic study did not directly assessperformance in the case of related donors, but rather addressed thisscenario using in silico adjustment from measurements of unrelateddonors.

TABLE 14 Comparison of KidneyScan and Grskovic dd-cfDNA assay keyperformance metrics Bloom et al [Grskovic KidneyScan et al, 2016] Limitof Blank (%) 0.11 0.10 Limit of Detection (%) 0.15 0.15 Limit ofQuantitation (%) 0.15 0.20 Repeatability (within run) 1.85 9.2 at 0.6%donor fraction (CV) Inter-run precision (CV) 1.99 4.5

Clinical Validity

Our assay has been shown to identify all types of active rejection (AR)with greater sensitivity and specificity than serum creatinine orestimated glomerular filtration rate (eGFR), the current standard ofcare. This performance validation underscores the assay's potential useas (1) a better tool for the early, non-invasive identification of AR;(2) avoidance of biopsy when it is unnecessary (no actionable finding)or contraindicated; and (3) personalization of immunosuppressiontherapy. In brief, this section includes a short description of theclinical validation that has been conducted as well as discussion offive performance aspects by which this test was clinically evaluated:

-   -   89% sensitivity and 73% specificity for detection of AR    -   High accuracy in detecting subclinical rejection with 92%        sensitivity    -   More reliable than SCr and eGFR for detection of AR    -   Test performance independent of rejection type, including ABMR        and TCMR    -   Test performance independent of donor type, including        living/deceased and related/unrelated

Test performance was validated in a population with broad diversity ofage and ethnicity. This is a clinically significant advantage of ourstudy vs. Bloom et al, 2017, whose patient population was less diverse.Graft survival and patient management are known to vary by ethnicity,for example, the eGFR metric is calculated based on serum creatinine(SCr), with adjustment for age, sex and ethnicity.

89% Sensitivity and 73% Specificity for Detection of AR

A comparison of Applicant's dd-cfDNA test, the dd-cfDNA test describedin Bloom et al, 2017, and eGFR shows the superiority of dd-cfDNAcompared with the current standard (Table 15). It also shows highersensitivity, AUC, and NPV of Applicant's dd-cfDNA assay compared withBloom, indicating performance of Applicant's methods that is as good orbetter than the one outlined in Bloom.

TABLE 15 Comparison of dd-cfDNA Tests Performance to eGFR Applicant'sdd-cfDNA Bloom Test et al, 2017 eGFR Cutoff level >1% >1% <60 foridentifying AR Overall Performance Sensitivity 88.7% (95% 59% (95% CI,67.8% (95% CI, confidence 44%-74%) 51.3%-84.2%) interval [CI],77.7%-99.8%) Specificity 72.6% (95% CI, 85% (95% CI, 65.3% (95% CI,65.4%-79.8%) 79%-91%) 57.6%-73.0%) AUC 0.87 (95% CI, 0.74 (95% CI, 0.74(95% CI, 0.80-0.95) 0.61-0.86) 0.66-0.83) PPV* 52.0% (95% CI, 57% 39.4%(95% CI, 44.7%-59.2%) 31.6%-47.3%) NPV* 95.1% (95% CI, 86% 85.9% (95%CI, 90.5%-99.7%) 75.9%-92.2%) PPV^(†) 36.4% (95% CI, 41% 25.6% (95% CI,29.6%-43.1%) 19.4%-31.9%) NPV^(†) 97.3% (95% CI, 92% 92.0% (95% CI,94.8%-99.9%) 88.1%-95.8%) *Assumes a 25% prevalence of rejection(at-risk population) ^(†)Assumes a 15% prevalence of rejection (lowerrisk population)

High Accuracy in Detecting Subclinical Rejection with 92% Sensitivity

FIG. 41 shows assay performance for the subset of samples drawn at thetime of a for-cause biopsy and protocol biopsy; performance shown inprotocol biopsies is expected to reflect performance when the assay isused in routine surveillance, that is, when there are no signs of renalinjury. This cohort of 114 samples showed detection of AR with:

-   -   92.3% sensitivity (95% CI, 64.0%-99.8%)    -   75.2% specificity (95% CI, 65.7%-83.3%)    -   0.89 area under the curve (AUC) (95% CI, 0.76-0.99)

Based on a 25% prevalence of rejection in an at-risk population, thefollowing value projections could be made:

-   -   Positive predictive value (PPV) of 55.4% (95% CI, 46.2%-64.7%)    -   Negative predictive value (NPV) of 96.7% (95% CI, 90.6%-99.9%)

More Reliable than SCr and eGFR for Detection of AR

The data showed that Applicants assay distinguishes accurately betweenAR vs. non-AR grafts, with the fraction of dd-cfDNA significantly higherin the circulating plasma of the AR group (median=2.32%) than thenon-rejection group (median=0.47%; P<0.0001) (FIG. 42). In contrast todd-cfDNA, eGFR scores did not have as much discriminatory ability fordifferentiating AR and individual non-rejection groups.

TABLE 16 Summary Statistics for dd-cfDNA and eGFR Tests Parameter ActiveRejection Non-Rejection dd-cfDNA Number of samples 38 179 (82.5) Mean(SD), % 4.64 (5.45) 0.92 (1.28) Median, (Range), % 2.32  0.47 (0.1-23.9)(0.04-6.78) eGFR Number of samples 38 179 (82.5) Mean (SD), score 49.0(22.3) 77.0 (8.45) Median, (Range), score 45.67 76.06  (8.0-100.4) (6.4-131.1) *One sample had missing weight information needed tocalculate eGFR.

Test Performance Independent of Rejection Type, Including ABMR and TCMR

FIG. 43 shows the relationship between dd-cfDNA level and type ofrejection. Median dd-cfDNA did not differ significantly between AMBR(2.2%), ABMR/TCMR (2.6%), or TCMR (2.7%) groups (P=0.855). The studycontained a range of pathologies, and the data indicate that this assayis robust to all different types of active rejection.

These results are novel considering that a previously conducted study byBloom, et al., 2017, which used a different assay, demonstrated aninability to differentiate TCMR from STA. That study found significantlylower dd-cfDNA levels for TCMR (<1.2%) than for ABMR (2.9%). This is aclinically significant finding that differentiates Applicant's assay andsupports expanded clinical utility relative to currently available testson the market

Test Performance Independent of Donor Type, Including Living/Deceasedand Related/Unrelated

Given the design of the assay used here, it is possible to quantifydd-cfDNA without prior recipient or donor genotyping. There was nosignificant difference among the dd-cfDNA level medians between any ofthe non-rejection donor groups; though the AR groups appeared similarbetween the donor groups, there were not enough samples to make astatistical comparison (FIG. 44). Evaluation of dd-cfDNA levels by donortype revealed that regardless of donor type (living related, livingnon-related, deceased non-related), dd-cfDNA levels were similar acrossall donor types within in the AR and non-rejection categories.

In conclusion, this rapid, accurate, and noninvasive technology allowsfor detection of clinically impactful renal injury in patients betterthan the current standard of care, with the potential for better patientmanagement, more targeted biopsies, and improved renal allograftfunction and survival.

Example 5. Clinical Utility

The clinical utility of early detection and treatment of activeallograft rejection is well-established. We have previously outlined thelimitations of existing diagnostic tools for detecting active rejection,and the need for a test that is both sensitive and non-invasive. Ourdd-cfDNA assay meets this need, with utility to be measured in terms of:

-   -   Fewer unnecessary biopsies (where there is no AR diagnosis)    -   More frequent detection of subclinical AR    -   More targeted and personalized use of immunosuppression therapy.        This change will likely take more time to observe than the        change in biopsy use, since physicians may be slower to adjust        their immunosuppression treatment patterns than their biopsy        decisions.

How the Test Will be Used in Practice

We recommend clinical use of the test by physicians whenever there is asuspicion of rejection, to help rule-in and rule-out active rejection,to inform the need for a diagnostic biopsy, and to inform treatmentdecisions when a biopsy is contraindicated. Incidence of AR is highestin the first 12 months post-transplant, so we anticipate more frequentuse in that period, and then less frequent use after 12 monthspost-transplant.

Turnaround time of test results will be as fast as 3 calendar days fromspecimen receipt in the laboratory. We are highly confident in ourlaboratory's ability to process these specimens with speed and quality,as we already process >1000 cfDNA tests on a daily basis for OBGYNphysicians to support care for their pregnant patients.

Test results will include the observed dd-cfDNA level (a.k.a. “Donorfraction”), a clear communication of whether this falls above or belowthe predetermined cutoff of 1%, a summary statement indicating high orlow risk of rejection, and the post-test risk of rejection estimatedusing a background AR prevalence of 25%.

How the Test Will Change Physician Decision-Making

There are many cases when physicians suspect active rejection but areuncertain, leading to missed diagnoses as well as unnecessary biopsies.

-   -   In cases of stable SCr and donor fraction >1%, we believe        physicians will switch their decision from observation to        biopsy, to catch subclinical rejection and begin treatment.    -   In cases of moderately elevated SCr and donor fraction <<1%, we        believe physicians will often switch their decision from biopsy        to observation, and will seek other explanations for decreased        kidney function besides active rejection. It is well-established        that SCr will often return to normal levels without any        additional treatment.    -   In cases of severely elevated SCr (such as SCr >2.5), we believe        physicians will pursue diagnostic biopsy without waiting for the        dd-cfDNA result.

In the Sigdel et al. 2017 study, 76% of clinically indicated biopsies(for-cause) and 89% of surveillance/protocol biopsies did not result ina diagnosis of active rejection, implying that the biopsies wereunnecessary. With overall test specificity of 73%, if physicians madeclinical decisions based solely on the dd-cfDNA test result, then 73% ofthe unnecessary biopsies could have been avoided. However, we anticipatethat physicians will incorporate dd-cfDNA result as one among severalfactors in the biopsy decision, not the sole factor. Therefore, wehypothesize that a significant portion of these unnecessary biopsieswill be avoided, perhaps 40-50%. This hypothesis will be evaluated inprospective outcomes studies, as described below.

Renal transplant recipients are fundamentally a high-risk populationwith ESRD, the unmet need is high, and the test performance is strong.The outcomes study is designed to answer the question of how muchclinical practice will change, not whether it will change.

Clinical Advantage: Identifying Active Rejection and SubclinicalRejection

The immunological processes that lead to renal allograft rejection areheterogeneous, caused by humoral and cellular immune responses. Inaddition to ABMR and TCMR, which are the leading cause of allograftfailure, subclinical rejection is also associated with chronic allograftnephropathy. Subclinical rejection, defined as histologically-provenacute rejection is considered the most common cause of late renalallograft failure, occurring in 20-25% of patients in the first 12-24months. The likelihood of subclinical rejection depends on the timeafter transplantation, prior acute rejection, HLA mismatch andimmunosuppression. A study showed patients with subclinical rejectionhad lower graft survival rates than patients with normal or borderlinechanges at 1, 5, and 10 years.

Timely treatment of subclinical rejection has potential to change thelong-term therapeutic outcome of renal transplant health as evidenced bya study that showed treatment of subclinical rejection led to reductionin early (months 2 and 3) and late (>6 months) clinical rejections, alower chronic tubulointerstitial score at 6 months, and better graftfunction at 2 yr. The major limitation with the treatment of subclinicalrejection is that the high proportion of the cases remain unrecognizeddue to limited sensitivity of current standard-of-care, which in turnrequire surveillance biopsy for a definite diagnosis. Thus, earlydetection techniques, such as dd-cfDNA testing has potential tonon-invasively detect the possibility of subclinical rejection, therebyfacilitating better treatment outcomes and increase graft survivalrates.

Patient Risk Stratification and Utility of Dd-cfDNA

Recipients of kidney transplant represent a heterogeneous populationwith varying risk of rejection and infection based on patient subgroups.Below are some of the factors that collectively influence a clinician'sclassification of patient risk for rejection

-   -   De novo donor-specific antibody (dnDSA) formation    -   Interstitial fibrosis and tubular atrophy (IF/TA)    -   Delayed allograft function    -   Panel reactive antibody (PRA)>30%    -   Inadequate immunosuppressive therapy    -   Calcineurin inhibitor nephrotoxicity    -   Underlying disease    -   Deceased donor    -   Younger recipient age    -   Older donor age    -   African American ethnicity    -   Cold ischemia time >24 hours    -   HLA mismatch    -   ABO incompatibility

The treatment regimen for each patient is highly variable and depends onthe risk-category. At present no clear guidelines exist that canstratify patients into different risk-groups. In general, patients witha higher risk for rejection are monitored more closely and theirmanagement is handled solely at the physician's discretion. This leadsto high variability with non-optimal outcomes in many situations.Applicant's methods can be highly effective at addressing this unmetneed, wherein dd-cfDNA can optimize and enable physicians to take awell-informed decision by stratifying patients into risk groups, improvetreatment variability, and avoid unnecessary biopsies.

In addition to increasing the likelihood of long-term graft survival,early detection of active injury by dd-cfDNA has other potentialbenefits. An accurate dd-cfDNA assay can help physicians manage renalallograft health by maintaining a minimally effective dose ofimmunosuppressives to prevent rejection, while avoiding their associatedcomplications, such as:

-   -   BK viremia    -   Increased susceptibility to other infections    -   Calcineurin inhibitor nephrotoxicity    -   Increased incidence of cancer

Studies examining morbidity and mortality in long-term allograftrecipients demonstrate that cardiovascular disease and cancer are thetwo most common causes of death. Risk stratification models have beenproposed to implement individual risk profiles to tailorimmunosuppressive and antibacterial treatments. We believe that the testwill support accelerated tapering of immunosuppression, for patients whohave dd-cfDNA levels consistently <1%.

Supporting Studies Currently Underway

In order to provide additional evidence of the clinical utility ofApplicant's dd-cfDNA test, two studies are currently in progress, theresults of which will be submitted for peer-reviewed publication.

1. Dd-cfDNA Clinical Utility Randomized Controlled Trial

-   -   a. Timeline: Preliminary results expected early 2019    -   b. Design: Pre-post, two round controlled trial of care        practices in a nationally representative sample of practicing        nephrologists. Nephrologists manage virtual patient cases before        and after receiving education about dd-cfDNA assay and test        results.    -   c. Objective(s):        -   i. Determine current management protocols and variations in            post-transplant management among practicing nephrologists        -   ii. Assess impact of novel dd-cfDNA biomarker in patient            management    -   d. Expected outcome(s):        -   i. Nephrologists are highly variable in their ability and            approach to assess kidney health and transplant rejection            status        -   ii. Applicant's transplant rejection test will improve            patient management in key use cases, enabling nephrologists            to better monitor post-kidney transplant patients' health,            and optimize biopsy use and immunosuppressant regimens.

2. Dd-cfDNA Registry Study

-   -   a. Timeline: Five years to study read completion, with annual        read out.    -   b. Design: 1,500 patients tested at 1, 2, 3, 4, 6, 9 and 12        months post-transplant, and quarterly thereafter. Blood also        drawn at the time of and one month after for-cause biopsies.        Patients followed for three years.    -   c. Objective: to show change in clinical practice and improved        outcomes    -   d. Primary study endpoints:        -   i. Precision biopsy usage (% of biopsies resulting in active            rejection diagnosis)        -   ii. Physician decision-making (with and without cfDNA            result)

Summary

The limitations of current standard-of-care for monitoring active/earlyinjury in patients post-kidney transplant impacts long-term graftsurvival outcome. Medical advances that have improved short-term kidneygraft survival rate have not impacted long-term graft survival rates.Furthermore, current literature supports the evidence that subclinicalrejection is an underlying cause for adverse clinical outcomes leadingto long-term graft rejection.

dd-cfDNA is an ideal biomarker to add to post-transplant management ofactive rejection as it can non-invasively detect early active rejectionand subclinical rejection with superior sensitivity and specificitycompared to serum creatinine and eGFR. This allows for timely tailoringof immunosuppressive regimen based on the inflammatory status of thegraft, providing a more personalized approach to minimizing theincidence of rejection, and the unwanted side effects. Frequentanalysis, combined with early detection can improve long-term allograftfunction and decrease the number of unnecessary biopsies in stablepatients.

Example 6. KidneyScan® Donor-Derived Cell-Free DNA Test

Summary of Evidence

In 2016, over 20,000 kidney transplants were performed in the UnitedStates. In addition, over 80,000 surgical candidates were on dialysiswhile they waited for an available kidney. After transplantation,patients are put on immunosuppressant drug therapy and routinelymonitored to prolong the survival of the donor kidney. Despiteestablished protocols, allograft survival rates ten yearspost-transplant are estimated to be as low as 48% for deceased donorsand 65% for living donors.

Advances in kidney transplantation and post-transplantation care havecontinued to improve organ functioning and survival rates over time.While this has been most evident in the successful treatment of acutekidney rejection in the first year post-transplant, success after thistime period has remained relatively unchanged for decades (failure ratesof 3-5% per year for deceased donor and 2-3% per year for living donorkidneys). Kidney injury, leading to irreversible damage and eventualgraft loss, is often asymptomatic for weeks or months, and can bechallenging to detect given current standard of care of measuring levelsof serum creatinine (SCr) or its algorithmic derivative, estimatedglomerular filtration rate (eGFR). Both have significant limitations forearly injury detection.

The KidneyScan test detects donor-derived cell-free DNA (dd-cfDNA) in arecipient's blood, which is elevated during active rejection due toincreased cell death in the organ. KidneyScan is an effective,non-invasive method of assessing kidney allograft status with betterperformance than the current standard of care.

KidneyScan Donor-Derived Cell-Free DNA Test Description and Performance

The KidneyScan assay is a cell-free DNA-based, next-generationsequencing assay that analyzes over 13,000 single-nucleotidepolymorphisms (SNPs) to accurately quantify the fraction of dd-cfDNA inthe transplant recipient's blood, even in related recipient/donor pairs,without separate genotyping of either donor or recipient. The dd-cfDNAfraction in cfDNA can be measured with a turnaround time of 5 days orless; this turnaround time is necessary for the appropriate managementof transplant recipients.

The clinical performance of KidneyScan was evaluated in a retrospectiveanalysis of 217 samples from 178 unique transplant recipients whosekidney transplant was performed at the University of California at SanFrancisco (UCSF) Medical Center. The data showed that the dd-cfDNAlevels in patients with active rejection (AR) was significantly higherthan patients with stable allograft (STA), borderline (BL), or otherinjury (04 Importantly, the trend was clear regardless of the type ofrejection—antibody mediated rejection (ABMR) or T-cell mediatedrejection (TCMR). We believe the elevated dd-cfDNA levels are indicativeof ongoing injury to the transplanted organ. We therefore analyzed theability of the assay to detect AR versus non-rejection, wherenon-rejection is defined as all specimens that were classified as STA,BL, or M.

The amount of dd-cfDNA was significantly higher in the circulatingplasma of the AR group (median=2.32%) compared with the non-rejectiongroup (median=0.47%; P<0.0001).

Using a predetermined dd-cfDNA cutoff of 1%, the data showed theKidneyScan assay to have an 88.7% sensitivity (95% confidence interval[CI], 77.7%-99.8%) and 72.6% specificity (95% CI, 65.4%-79.8%) fordetection of AR. The area under the curve (AUC) was 0.87 (95% CI,0.80-0.95). Based on a 25% prevalence of rejection in an at-riskpopulation, the positive predictive value (PPV) was projected to be52.0% (95% CI, 44.7%-59.2%) and the negative predictive value (NPV) wasprojected to be 95.1% (95% CI, 90.5%-99.7%).

Furthermore, the KidneyScan assay performance for the subset of 114samples drawn at the time of a protocol biopsy, which is expected toreflect performance when the assay is used in routine surveillanceshowed detection of AR with 92.3% sensitivity (95% CI, 64.0%-99.8%),75.2% specificity (95% CI, 65.7%-83.3%), and a 0.89 area under the curve(AUC) (95% CI, 0.76-0.99). Based on a 25% prevalence of rejection in anat-risk population, the PPV was projected to be 55.4% (95% CI,46.2%-64.7%) and the NPV was projected to be 96.7% (95% CI,90.6%-99.9%). These data suggest that application of the KidneyScanassay in a clinical setting could potentially reduce the need forprotocol biopsies. It also may be appropriate for use post-rejection todetermine whether immunosuppressant dosing has led to clearance of therejection episode, in place of biopsy.

Median dd-cfDNA did not differ significantly between different types ofrejection: AMBR (2.2%), ABMR/TCMR (2.6%), or TCMR (2.7%) groups(P=0.855). These results are novel considering that a previously study,using a different assay, found significantly higher dd-cfDNA levels forABMR (2.9%) than for TCMR (<1.2%), indicating an inability to detectT-cell mediated rejections. Though the assay used in that study alsomeasured dd-cfDNA, the methods used by the two assays differsignificantly. It is unclear whether that test could not differentiateAR from non-rejection in cases of TCMR or if the result was due to thesmaller sample size of that group in that study (n=11). Regardless, theKidneyScan assay can accurately discriminate AR from non-rejectionacross a range of pathologies, including both acute and chronicfindings, in both the ABMR and TCMR groups.

The analytical and clinical performance of the KidneyScan assay issummarized below.

General

Intended Use.

The KidneyScan test is intended to supplement the evaluation andmanagement of kidney injury and active rejection in patients who haveundergone renal transplantation. It can inform decision making alongwith standard clinical assessments

Specimen Type.

Plasma collected in Streck Cell-Free DNA BCT® tubes

Description Results

Accuracy.

Unrelated donor:

Slope=1.0664 (95% CI 0.9416, 1.1912)

Intercept=0.0008 (95% CI−0.0076, 0.0092)

R-squared=0.9997 (95% CI 0.9997, 0.9998)

Related donor:

Slope=1.0333 (95% CI 0.9241, 1.1425)

Intercept=−0.0001 (95% CI −0.0047, 0.0046)

R-squared=0.9989 (95% CI 0.9986, 0.9990)

Accuracy was assessed using linear regression with respect to digitaldroplet PCR (ddPCR) (for CNV2 using probes specific for chromosome 1 asreference and chromosome Y as unknown) as a reference method. Threecell-line reference mixtures (1 related donor, 2 unrelated donors) atmixture fractions from 0.1% to 15% were run in minimum triplicates at15, 30, 45 ng input DNA mass. The total number of measurements were 285unrelated and 349 related.

Intermediate Precision (Inter-Assay Total Variability)

Quantitative:

Mean CV with 15 ng input=3.10% (95% CI 1.58%, 4.37%)

Mean CV with 30 ng input=3.07% (95% CI 1.42%, 4.50%)

Mean CV with 45 ng input=1.99% (95% CI 1.10%, 2.75%)

Qualitative: 100% concordance (95% CI 54.07%, 100%) between replicatesof 6 transplant patient samples.

To quantitatively assess inter-run precision, 3 reference panels (2unrelated, 1 related) with 15, 30, 45 ng input DNA mass at 0.1, 0.3,0.6, 1.2, 2.4, 5, 10% donor fractions were used. A total of 24 runs(across 23 days) on 17 instruments by different operators usingdifferent reagent lots were performed. 15 ng input samples were run with12 replicates, whereas 30 and 45 ng input samples were run with 6replicates, resulting in 248, 124, 126 measurements for 15 ng, 30 ng, 45ng input DNA mass, respectively. To qualitatively assess inter-runprecision, 6 transplant patient samples were assayed at variable input(20 mL blood each) in duplicates, resulting in 12 measurements.

Sensitivity-Minimum Input

15 ng minimum input tested in samples where input cfDNA concentrationwas measured.

Limit of Detection

0.15% unrelated donor

0.29% related donor

Limit of Detection was assessed from 3 cell-line (2 unrelated, 1related) reference panels at 15, 30, 45 ng input DNA mass, and 16plasma-derived cfDNA mixtures at variable input mass, at mixturefractions 0.1, 0.3, 0.6%. Samples were run in minimum triplicates by twooperators using two reagent lots, for a total of 168 (94 from Lot 1, 74from Lot2) and 220 (115 from Lot 1, 105 from Lot 2) measurements forunrelated and related donors, respectively. Samples from each reagentlot were evaluated using each of the two dfe methods, unrelated andrelated donor. LoD values for each lot and each dfe method arecalculated by using the LoB values of the corresponding method. For eachmethod, the final LoD is the maximum of lot 1 and lot 2 LoD valuescalculated with the corresponding method. Calculations followed theparametric method described in EP17A2.

Lower Limit of Quantitation

Lower limits:

0.15% unrelated donor

0.29% related donor

Lower limit is assessed on the same sample set used for LoD, with abroader range of mixture fractions. Specifically, reference samples weretested at mixture fractions 0.1, 0.3, 0.6, 1.2, 2.4, 5, 10, 15%, andplasma-derived cfDNA mixtures were tested at mixture fractions 0.1, 0.3,0.6, 1.2, 2.4, 5, 10%. Samples were run in minimum triplicates by twooperators using two reagent lots, for a total of 381 (207 from Lot 1,174 from Lot 2) and 412 (239 from Lot 1, 173 from Lot 2) measurementsfor unrelated and related donors, respectively. Lower limit is definedas the lowest value of donor fraction at which measurement CV (definedas the measurement standard deviation divided by the mean) is less than20%. The requirement is satisfied over the entire tested range, for eachreagent lot and dfe method, so the lower LoQ is equal to the LoD byconstraint that it cannot be less.

Upper Limit of Quantitation

Upper limit of quantitation=15% for unrelated and related donors basedon highest value tested.

Reference Range

Reference range defined as 0 to 1% based on previously published andapproved technology using same analyte with corresponding patientpopulation [Bromberg et al, 2017].

Interfering substances

Interference of excess ethanol carry-over and excess EDTA on themultiplex PCR reaction was evaluated using Applicant's cfDNA protocol.Inhibitory effect on mmPCR reaction was observed at ethanolconcentration of 5% and higher and EDTA concentration of 10 mM andhigher. Visually hemolyzed samples are excluded from processing.

Critical Reagent Shelf-Life and (as Applicable) Open Stability

Real time stability studies were used to establish shelf life of mmPCRprimer pool. Manufacturer recommended shelf life is used for reagentsacquired from third party vendor (PCR mastermix, library preparationenzymes and buffers, standard primers and NGS reagents for sequencing).Reagent stability is additionally monitored by in-line quality controlmetrics in every run. Incoming reagent qualification procedures wereestablished for all critical reagents (primer pool, PCR enzymes, librarypreparation enzymes, standard primers and NGS reagents for sequencing)in the workflow and only pre-qualified reagents, within establishedexpiration dates are used in sample processing.

Specimen Stability: Primary Sample

Primary sample stability was established using retrospective analysis ofdata derived from Applicant's cfDNA protocol using 227450 samplesprocessed between 1 and 8 days after collection. Data was categorizedbased on age after sample collection and performance was compared acrossdifferent time points after collection. The maximum acceptable sampleage for blood collected using Streck BCT tubes was established to be 8days based on the above analysis.

Specimen Stability: Intermediate

Intermediate sample stability for plasma stored at −80° C. and cfDNAlibraries stored at −20° C. were established using retrospective dataanalysis and concordant results of Applicant's cfDNA protocol withoriginal time point. Stability of plasma at −80° C. is 25-27 months;stability of cfDNA library at −20° C. is 26-30 months

Clinical Performance: Validity

Description Results (with 95% Confidence Intervals if Applicable)*

Active vs No Rejection In Protocol Biopsy Cohort Sensitivity 88.7%(77.7%-99.8%) 92.3% (64.0%-99.8%) Specificity 72.6% (65.4%-79.8%) 75.2%(65.7%-83.3%) NPV 95.1% (90.5%-99.7%) 96.7% (90.6%-99.9%) PPV 52.0%(44.7%-59.2%) 55.4% (46.2%-64.7%)

The amount of dd-cfDNA was significantly higher in the circulatingplasma of the AR group (median=2.32%) compared with the non-rejectiongroup (median=0.47%; P<0.0001). Median dd-cfDNA did not differsignificantly between different types of rejection: AMBR (2.2%),ABMR/TCMR (2.6%), or TCMR (2.7%) groups (P=0.855). The data demonstratethat the KidneyScan assay can accurately discriminate AR fromnon-rejection across a range of pathologies, including both acute andchronic findings, in both the ABMR and TCMR groups.

At least 2 weeks post-transplant to allow any renal injury occurringimmediately at surgery or as a result of cadaveric origin to resolveitself prior to testing. KidneyScan data is currently equivocal in thistimeframe. For this reason, and to be consistent with othercommercially-available dd-cfDNA tests, we have adopted this criterionfor patient safety and data integrity purposes.

Example 7. Further Technology Description

Sample Processing and Sequencing

Whole blood is collected in Streck Cell-Free DNA BCT (blood collectiontubes) and shipped to Applicant's CAP/CLIA laboratory, where they areprocessed using the following steps.

-   -   Patient blood samples are centrifuged to separate plasma from        blood cells    -   Applicant's in-house proprietary extraction chemistry is used        for cfDNA extraction from plasma    -   Extracted cfDNA is subsequently made into a library by ligation        of adapters followed by PCR amplification to increase the total        available cfDNA    -   The selected set of thousands of SNP loci are amplified by        targeted massively multiplex PCR (mmPCR)    -   Amplified samples are barcoded, multiplexed and sequenced using        NGS technology

(Illumina NextSeq, 50 cycle SE reads)

The mmPCR protocol uses Applicant's proprietary chemistry andamplification conditions to achieve uniform amplification across thetarget set and at the same time maintaining an extremely low PCRintroduced error rate. The application of a similar mmPCR approach tonon-invasive prenatal testing has been published in multiple studies andresulted in over one million patient test results. The SNPs wereselected for high variant allele frequency across different ethnicities(FIG. 45). The PCR amplicons are barcoded to enable sample levelmultiplexing and barcoded samples are pooled and then sequenced usingIllumina NextSeq instrument, 50 cycles, single end reads. FIG. 45 showscumulative distributions of SNP minor allele frequency according toethnicity

Sequencing Analysis

Sequenced reads are demultiplexed and mapped to the standardized humanreference genome (hg19) using Novoalign version 2.3.4. Bases arefiltered based on Phred quality score and reads are filtered based onmapping quality score. Multiple quality checks on metrics such ascluster density and mapping rate are applied to the sequencing run andeach sample is confirmed to have obtained the minimum required number ofreads after filtering.

From each sequence read we extract only the allele observed at thetargeted SNP position. The allele is labeled as reference or mutantfollowing the definitions in the hg19 reference genome and all followingcalculations are based on the set of reference and mutant allele counts.At each SNP, the fraction of reference counts compared to the total isdefined as the SNP's allele ratio.

Donor Fraction Calculation

The donor fraction calculation begins by estimating recipient genotypesand eliminating SNPs where the recipient is heterozygous (allele ratiobetween 30% and 70% (FIG. 46)). We define the homozygous referencegenotype as RR, homozygous mutant genotype as MM, and heterozygousgenotype as RM. The donor fraction is calculated from the set of SNPswhere the recipient is homozygous, by a method based on considering allof the possible donor genotypes.

When the donor fraction is approximately zero, the observed alleleratios simply reflect the recipient genotypes and so all SNPs where therecipient is homozygous have allele ratio approximately zero or 1 (FIG.47).

The likelihood of a candidate donor fraction is defined as theprobability of producing the observed sequencing data according to amathematical model for how the data depends on the donor fraction. Weassume that the data at each SNP is independent conditioned on the donorfraction, which implies that the combined likelihood is the product ofthe likelihoods calculated at each SNP.

The likelihood calculation at a single SNP incorporates two sources ofuncertainty: the donor genotypes and the sequencing data. The donorgenotypes are modeled probabilistically by summing over the set ofpossible genotypes and weighting them according to the priorprobability, defined by the population minor allele frequencies. Thesequencing data is modeled using a binomial distribution as a functionof the expected allele ratio and measured number of reads, given anassumed donor genotype and estimated error rates due to sequencing plusPCR.

Materials and Equipment

Major equipment and reagents used in the execution of this assay aredetailed in Table 17.

TABLE 17 Critical Equipment and Reagents Item Manufacturer Use NextSeqInstrument IIlumina Next Generation Sequencing GeneAmp PCR System ThermoLibrary preparation and PCR 9700 Thermocycler Fisher AmplificationNextSeq High Output IIlumina Next Generation Sequencing ReagentCartridge v2, 75 cycles NextSeq 500/550 IIlumina Next GenerationSequencing Buffer Cartridge v2 NextSeq Flow Cell v2 IIlumina NextGeneration Sequencing Natera library prep kit Natera Preparation oflibrary from cfDNA Natera cfDNA Natera cfDNA extraction from plasmaextraction kit mmPCR primer pool Natera Targeted amplification of SNPloci

Example 8. Improved Determination of Transplant Rejection by Using aThreshold Metric that Takes into Account the Body Mass of the Patient

The data acquired in Example 6 was evaluated by using a threshold basedon donor copies/ml and a further threshold that additionally takes intoaccount the body mass of the patient. The data from Example 6 wasderived from 217 biopsy matched samples (193 patients), out of which 38samples showed active or acute rejection (AR) and 179 showed nonrejection (NON-AR). The cfDNA was quantified for 215 of the samples andpatient mass was measured for 123 of the samples (excluding pediatric).The data from these 123 samples are shown in Table 18 below.

TABLE 18 Re-analysis of data from 123 patient from which body mass wasmeasured. Original Set New Set Samples % of Set Samples % of Set AR 3818% 31 25% NON-AR 179 82% 92 75% BL 72 33% 64 52% OI 25 12% 18 15% STA82 38% 10  8% Total 217 123 Sensitivity 86.8%* 83.9% Specificity 70.9%*75.0% Protocol AR 12 of 13 9 of 10 *Recalculed from raw data withoutaccounting for multiple samples from the same patient.

The data were first analyzed using donor-derived copies/mL, which wascalculated as follows: (ng cfDNA)/(3.3 pg/haploid genome)*(dd-cfDNA%)/(mL plasma).

To take into account the body mass of the patient, the data wereanalyzed using donor-derived copies/mL*Patient Mass (abbreviated “donorcopies/mL*kg”), which was calculated as follows: (ng cfDNA)/(3.3pg/haploid genome)*(dd-cfDNA %)/(mL plasma)*(patient kg). This analysisaccounts for host blood volume (approximated using patient mass)diluting a signal from a fixed transplant mass.

As shown in Table 19 below, a threshold of 976 donor copies/mL*kg and athreshold of 13.4 donor copies/mL corresponds to a threshold of 1.00%dd-cfDNA.

TABLE 19 Median Min Max Threshold ng cfDNA/mL 4.0 0.2 353 N/A dd-cfDNA %0.62% 0.02% 23.90% 1.00% Donor 9.9 0.6 857 13.4 copies/mL Donor 727 4068,544 976 copies/mL*kg

Analyzing the data from Example 6 by using the donor copies/mL metricand the donor copies/mL*kg metric as the fixed threshold instead ofdd-cfDNA % resulted in the sensitivity and specificity as shown in FIG.48. Using dd-cfDNA % as the threshold metric resulted in a sensitivityof 83.9% and a specificity of 75.0%, and protocol active rejections werecorrectly called 9 of 10. Using donor copies/mL as the threshold metricresulted in a sensitivity of 83.9% and a specificity of 75.0%, andprotocol active rejections were correctly called 9 of 10. Using donorcopies/mL*kg as the threshold metric resulted in a sensitivity of 77.4%and a specificity of 72.8%, and protocol active rejections werecorrectly called 9 of 10. Analyzing the data by using donor copies/mL ordonor copies/mL*kg as the threshold metric correctly called a protocolactive rejection and a T-cell mediated rejection missed by dd-cfDNA %(shown with black arrows in FIG. 48).

Example 9. Developing a Scaled Threshold to Improve Performance inMonitoring Transplant by Quantifying Donor Derived cfDNA

The purpose of this Example is to develop a scaled or dynamic thresholddepending on cfDNA ng/mL in the blood samples obtained from thepatients. It was observed that low input dd-cfDNA % influenced theestimated dd-cfDNA %. In particular, analyzing the relationship betweenestimated dd-cfDNA % and input dd-cfDNA % revealed that below 9 ng cfDNAinput, the pipeline estimated dd-cfDNA % increased.

Moreover, there appeared to be a linear relationship between thedd-cfDNA %, donor copies/mL, or donor copies/mL*kg, and the amount ofcfDNA (ng cfDNA/mL) in the blood samples as shown in FIG. 49. To furthertest if the threshold value varies depending on ng cfDNA/mL plasma, thesample data was stratified according to quartiles of ng cfDNA/mL plasmaas shown in FIG. 50. Stratification of the data based on ng cfDNA/mLplasma clearly showed that performance could be improved by scaling thethreshold value for the different quartiles of cfDNA amount. FIG. 52,showed that the effect of stratification of the data is similar for bothantibody mediated rejection (ABMR) and T-cell mediated rejection (TCMR).As shown in FIG. 51, both active or acute rejection (AR) andnon-rejection (NON-AR) samples were distributed across the quartiles oroctiles of cfDNA amount.

When the dd-cfDNA % threshold metric is used, the results presented inFIG. 50 showed that as cfDNA ng/mL increased, the specificity increasedand the sensitivity decreased. The analysis using dd-cfDNA % thresholdmetric missed a protocol active rejection in Q4. Table 20 below showsmore detailed results from the comparison of using fixed versus dynamicthreshold for the dd-cfDNA % threshold.

TABLE 20 Comparison of fixed threshold and scaled threshold for thedd-cfDNA % threshold metric. Fixed Threshold Scaled Threshold Q1 Q2 Q3Q4 Q1 Q2 Q3 Q4 Overall Threshold 1.00% 1.33% 1.00% 0.60% Fixed ScaledSensitivity 100% 100% 90% 43% 86% 100% 90% 71% Sensitivity 83.9% 87.1%Specificity  70%  64% 85% 83% 83%  64% 85% 75% Specificity 75.0% 76.1%Protocol ARs 9/10 9/10

For the donor copies/mL threshold metric, the analysis presented in FIG.50 showed that both sensitivity and specificity increased withincreasing cfDNA ng/mL plasma. The analysis using donor copies/mLthreshold metric missed a protocol active rejection in Ql. Table 21below, shows more detailed results from the comparison of using fixedversus dynamic threshold for the donor copies/mL threshold metric.

TABLE 21 Comparison of fixed threshold and scaled threshold for thedonor copies/mL threshold metric. Fixed Threshold Scaled Threshold Q1 Q2Q3 Q4 Q1 Q2 Q3 Q4 Overall Threshold 13.4% 5.0 13.4 25.0 Fixed ScaledSensitivity 29% 100% 100% 100% 71% 100% 100% 86% Sensitivity 83.9% 90.3%Specificity 96%  80%  75%  50% 83%  80%  75% 71% Specificity 75.0% 77.2%Protocol ARs 9/10 10/10

For the donor copies/mL*kg threshold metric, the analysis presented inFIG. 50 showed that both sensitivity and specificity increased withincreasing cfDNA ng/mL plasma. The analysis using donor copies/mL*kgthreshold metric missed a protocol active rejection in Ql. Table 22below, shows more detailed results from the comparison of using fixedversus dynamic threshold for the donor copies/mL*kg threshold metric.

TABLE 22 Comparison of fixed threshold and scaled threshold for thedonor copies/mL*kg threshold metric. Fixed Threshold Scaled Threshold Q1Q2 Q3 Q4 Q1 Q2 Q3 Q4 Overall Threshold 976 324 976 1952 Fixed ScaledSensitivity  0% 100% 100% 100% 71% 100% 100% 86% Sensitivity 77.4% 90.3%Specificity 91%  76%  85%  42% 78%  76%  85% 83% Specificity 72.8% 80.4%Protocol ARs 9/10 10/10

It was also found that the performance of the analysis can be furtherimproved by splitting the data into smaller ng/mL grouping as shown inTable 23 below.

TABLE 23 Comparison of fixed threshold and scaled threshold for thedonor copies/mL*kg threshold metric when the data is stratified intooctiles based on ng cfDNA/mL plasma. Fixed Initial performanceSensitivity 83.9% at 1% fixed: Specificity 75.0% Octile 1 Octile 2Octile 3 Octile 4 Octile 5 Octile 6 Octile 7 Octile 8 Overall Threshold5.0 8.0 13.4 17.0 22.0 Scaled Donor Sensitivity 60% 100% 100% 100% 100%100% 100% 100% Sensitivity 93.5% copies/mL Specificity 90%  92%  85% 75%  80%  90%  69%  73% Specificity 81.5% Threshold 500 976 1,600 1,800Scaled Donor Sensitivity 60% 100% 100% 100% 100% 83% 100% 100%Sensitivity 90.3% copies/mL*kg Specificity 90%  92%  85%  83%  80% 90% 69%  64% Specificity 81.5% Threshold 100 500 1,000 1,825 Scaled DonorSensitivity 100% 100% 100% 100% 100% 100% 100% 100% Sensitivity 100.0%copies/mL*kg Specificity  40%  92%  85%  67%  80%  90%  77%  64%Specificity  75.0%

In summary, this example showed that the performance of thetransplantation monitoring method disclosed herein may be improved byusing a scaled or dynamic threshold metric that takes into account theng cfDNA/mL plasma obtained from the samples, including improvedsensitivity and specificity. 100% of protocol biopsy active rejectioncases were called correctly when using the scaled thresholds with thenew metrics.

ADDITIONAL EMBODIMENTS Embodiment 1

A method of quantifying the amount of donor-derived cell-free DNA(dd-cfDNA) in a blood sample of a transplant recipient, comprising:

a) extracting DNA from the blood sample of the transplant recipient,wherein the DNA comprises donor-derived cell-free DNA andrecipient-derived cell-free DNA;b) performing targeted amplification at 500-50,000 target loci in asingle reaction volume using 500-50,000 primer pairs, wherein the targetloci comprise polymorphic loci and non-polymorphic loci, and whereineach primer pair is designed to amplify a target sequence of no morethan 100 bp; andc) quantifying the amount of donor-derived cell-free DNA in theamplification products.

Embodiment 2

A method of quantifying the amount of donor-derived cell-free DNA(dd-cfDNA) in a blood sample of a transplant recipient, comprising:

a) extracting DNA from the blood sample of the transplant recipient,wherein the DNA comprises donor-derived cell-free DNA andrecipient-derived cell-free DNA, and wherein the extracting stepcomprises size selection to enrich for donor-derived cell-free DNA andreduce the amount of recipient-derived cell-free DNA disposed frombursting white-blood cells;b) performing targeted amplification at 500-50,000 target loci in asingle reaction volume using 500-50,000 primer pairs, wherein the targetloci comprise polymorphic loci and non-polymorphic loci; andc) quantifying the amount of donor-derived cell-free DNA in theamplification products.

Embodiment 3

A method of detecting donor-derived cell-free DNA (dd-cfDNA) in a bloodsample of a transplant recipient, comprising:

a) extracting DNA from the blood sample of the transplant recipient,wherein the DNA comprises donor-derived cell-free DNA andrecipient-derived cell-free DNA;b) performing targeted amplification at 500-50,000 target loci in asingle reaction volume using 500-50,000 primer pairs, wherein the targetloci comprise polymorphic loci and non-polymorphic loci;c) sequencing the amplification products by high-throughput sequencing;andd) quantifying the amount of donor-derived cell-free DNA.

Embodiment 4

The method of any of the preceding Embodiments, further comprisingperforming universal amplification of the extracted DNA.

Embodiment 5

The method of any of the preceding Embodiments, wherein the transplantrecipient is a mammal.

Embodiment 6

The method of any of the preceding Embodiments, wherein the transplantrecipient is a human.

Embodiment 7

The method of any of the preceding Embodiments, wherein the transplantrecipient has received a transplant selected from organ transplant,tissue transplant, cell transplant, and fluid transplant.

Embodiment 8

The method of any of the preceding Embodiments, wherein the transplantrecipient has received a transplant selected from kidney transplant,liver transplant, pancreas transplant, intestinal transplant, hearttransplant, lung transplant, heart/lung transplant, stomach transplant,testis transplant, penis transplant, ovary transplant, uterustransplant, thymus transplant, face transplant, hand transplant, legtransplant, bone transplant, bone marrow transplant, cornea transplant,skin transplant, pancreas islet cell transplant, heart valve transplant,blood vessel transplant, and blood transfusion.

Embodiment 9

The method of any of the preceding Embodiments, wherein the transplantrecipient has received a kidney transplant.

Embodiment 10

The method of any of the preceding Embodiments, wherein the quantifyingstep comprises determining the percentage of donor-derived cell-free DNAout of the total of donor-derived cell-free DNA and recipient-derivedcell-free DNA in the blood sample.

Embodiment 11

The method of any of the preceding Embodiments, wherein the quantifyingstep comprises determining the number of copies of donor-derivedcell-free DNA per volume unit of the blood sample.

Embodiment 12

The method of any of the preceding Embodiments, wherein the methodfurther comprises detecting the occurrence or likely occurrence ofactive rejection of transplantation using the quantified amount ofdonor-derived cell-free DNA.

Embodiment 13

The method of any of the preceding Embodiments, wherein the method isperformed without prior knowledge of donor genotypes.

Embodiment 14

The method of any of the preceding Embodiments, wherein each primer pairis designed to amplify a target sequence of about 50-100 bp.

Embodiment 15

The method of any of the preceding Embodiments, wherein each primer pairis designed to amplify a target sequence of about 60-75 bp.

Embodiment 16

The method of any of the preceding Embodiments, wherein each primer pairis designed to amplify a target sequence of about 65 bp.

Embodiment 17

The method of any of the preceding Embodiments, wherein the targetedamplification comprises amplifying at least 1,000 polymorphic loci in asingle reaction volume.

Embodiment 18

The method of any of the preceding Embodiments, wherein the targetedamplification comprises amplifying at least 2,000 polymorphic loci in asingle reaction volume.

Embodiment 19

The method of any of the preceding Embodiments, wherein the targetedamplification comprises amplifying at least 5,000 polymorphic loci in asingle reaction volume.

Embodiment 20

The method of any of the preceding Embodiments, wherein the methodfurther comprises measuring an amount of one or more alleles at thetarget loci that are polymorphic loci.

Embodiment 21

The method of any of the preceding Embodiments, wherein the quantifyingstep comprises detecting the amplified target loci using a microarray.

Embodiment 22

The method of any of the preceding Embodiments, wherein the quantifyingstep does not comprise using a microarray.

Embodiment 23

The method of any of the preceding Embodiments, wherein the polymorphicloci and the non-polymorphic loci are amplified in a single reaction.

Embodiment 24

The method of any of the preceding Embodiments, wherein the targetedamplification comprises simultaneously amplifying 500-50,000 target lociin a single reaction volume using (i) at least 500-50,000 differentprimer pairs, or (ii) at least 500-50,000 target-specific primers and auniversal or tag-specific primer 500-50,000 primer pairs.

Embodiment 25

A method of determining the likelihood of transplant rejection within atransplant recipient, the method comprising:

a) extracting DNA from the blood sample of the transplant recipient,wherein the DNA comprises donor-derived cell-free DNA andrecipient-derived cell-free DNA;b) performing universal amplification of the extracted DNA;c) performing targeted amplification at 500-50,000 target loci in asingle reaction volume using 500-50,000 primer pairs, wherein the targetloci comprise polymorphic loci and non-polymorphic loci;d) sequencing the amplification products by high-throughput sequencing;ande) quantifying the amount of donor-derived cell-free DNA in the bloodsample, wherein a greater amount of dd-cfDNA indicates a greaterlikelihood of transplant rejection.

Embodiment 26

A method of diagnosing a transplant within a transplant recipient asundergoing acute rejection, the method comprising:

a) extracting DNA from the blood sample of the transplant recipient,wherein the DNA comprises donor-derived cell-free DNA andrecipient-derived cell-free DNA;b) performing universal amplification of the extracted DNA;c) performing targeted amplification at 500-50,000 target loci in asingle reaction volume using 500-50,000 primer pairs, wherein the targetloci comprise polymorphic loci and non-polymorphic loci;d) sequencing the amplification products by high-throughput sequencing;ande) quantifying the amount of donor-derived cell-free DNA in the bloodsample, wherein an amount of dd-cfDNA of greater than 1% indicates thatthe transplant is undergoing acute rejection.

Embodiment 27

The method of Embodiments 25 or 26, wherein the transplant rejection isantibody mediated transplant rejection.

Embodiment 28

The method of Embodiments 25 or 26, wherein the transplant rejection isT cell mediated transplant rejection.

Embodiment 29

The method of any of Embodiments 25-28, wherein an amount of dd-cfDNA ofless than 1% indicates that the transplant is either undergoingborderline rejection, undergoing other injury, or stable.

Embodiment 30

A method of monitoring immunosuppressive therapy in a subject, themethod comprising

a) extracting DNA from the blood sample of the transplant recipient,wherein the DNA comprises donor-derived cell-free DNA andrecipient-derived cell-free DNA;b) performing universal amplification of the extracted DNA;c) performing targeted amplification at 500-50,000 target loci in asingle reaction volume using 500-50,000 primer pairs, wherein the targetloci comprise polymorphic loci and non-polymorphic loci;d) sequencing the amplification products by high-throughput sequencing;ande) quantifying the amount of donor-derived cell-free DNA in the bloodsample, wherein a change in levels of dd-cfDNA over a time interval isindicative of transplant status.

Embodiment 31

The method of Embodiment 30, further comprising adjustingimmunosuppressive therapy based on the levels of dd-cfDNA over the timeinterval.

Embodiment 32

The method of Embodiment 31, wherein an increase in the levels ofdd-cfDNA are indicative of transplant rejection and a need for adjustingimmunosuppressive therapy Embodiment 33. The method of Embodiment 31,wherein no change or a decrease in the levels of dd-cfDNA indicatestransplant tolerance or stability, and a need for adjustingimmunosuppressive therapy.

Embodiment 34

The method of any of Embodiments 30-33, wherein an amount of dd-cfDNA ofgreater than 1% indicates that the transplant is undergoing acuterejection.

Embodiment 35

The method of Embodiment 34, wherein the transplant rejection isantibody mediated transplant rejection.

Embodiment 36

The method of Embodiment 34, wherein the transplant rejection is T cellmediated transplant rejection.

Embodiment 37

The method of any of Embodiments 30-33, wherein an amount of dd-cfDNA ofless than 1% indicates that the transplant is either undergoingborderline rejection, undergoing other injury, or stable.

Embodiment 38

The method of any of Embodiments 25-37, wherein the method does notcomprise genotyping the transplant donor and/or the transplantrecipient.

Embodiment 39

The method of any of Embodiments 25-38, wherein the method furthercomprises measuring an amount of one or more alleles at the target locithat are polymorphic loci.

Embodiment 40

The method of any of Embodiments 25-39, wherein the target loci compriseat least 1,000 polymorphic loci, or at least 2,000 polymorphic loci, orat least 5,000 polymorphic loci, or at least 10,000 polymorphic loci.

Embodiment 41

The method of any of Embodiments 25-40, wherein the target loci that areamplified in amplicons of about 50-100 bp in length, or about 50-90 bpin length, or about 60-80 bp in length, or about 60-75 bp in length.

Embodiment 42

The method of Embodiment 41, wherein the amplicons are about 65 bp inlength.

Embodiment 43

The method of any of Embodiments 25-42, wherein the transplant recipientis a human.

Embodiment 44

The method of any of Embodiments 25-43, wherein the transplant recipienthas received a transplant selected from kidney transplant, livertransplant, pancreas transplant, intestinal transplant, hearttransplant, lung transplant, heart/lung transplant, stomach transplant,testis transplant, penis transplant, ovary transplant, uterustransplant, thymus transplant, face transplant, hand transplant, legtransplant, bone transplant, bone marrow transplant, cornea transplant,skin transplant, pancreas islet cell transplant, heart valve transplant,blood vessel transplant, and blood transfusion.

Embodiment 45

The method of Embodiment 44, wherein the transplant recipient hasreceived a kidney transplant.

Embodiment 46

The method of any of Embodiments 25-45, wherein the extracting stepcomprises size selection to enrich for donor-derived cell-free DNA andreduce the amount of recipient-derived cell-free DNA disposed frombursting white-blood cells.

Embodiment 47

The method of any of Embodiments 25-45, wherein the universalamplification step preferentially amplifies donor-derived cell-free DNAover recipient-derived cell-free DNA that are disposed from burstingwhite-blood cells.

Embodiment 48

The method of any one of Embodiments 25-47, further comprisinglongitudinally collecting a plurality of blood samples from thetransplant recipient after transplantation, and repeating steps (a) to(e) for each blood sample collected.

Embodiment 49

The method of any one of Embodiments 1-48, wherein the method has asensitivity of at least 80% in identifying acute rejection (AR) overnon-AR with a cutoff threshold of 1% dd-cfDNA and a confidence intervalof 95%.

Embodiment 50

The method of any one of Embodiments 1-48, wherein the method has aspecificity of at least 70% in identifying AR over non-AR with a cutoffthreshold of 1% dd-cfDNA and a confidence interval of 95%.

Embodiment 51

The method of any one of Embodiments 1-48, wherein the method has anarea under the curve (AUC) of at least 0.85 in identifying AR overnon-AR with a cutoff threshold of 1% dd-cfDNA and a confidence intervalof 95%.

Embodiment 52

The method of any one of Embodiments 1-48, wherein the method has asensitivity of at least 80% in identifying AR over normal, stableallografts (STA) with a cutoff threshold of 1% dd-cfDNA and a confidenceinterval of 95%.

Embodiment 53

The method of any one of Embodiments 1-48, wherein the method has aspecificity of at least 80% in identifying AR over STA with a cutoffthreshold of 1% dd-cfDNA and a confidence interval of 95%.

Embodiment 54

The method of any one of Embodiments 1-48, wherein the method has an AUCof at least 0.9 in identifying AR over STA with a cutoff threshold of 1%dd-cfDNA and a confidence interval of 95%.

Embodiment 55

The method of any one of Embodiments 49-54, wherein the AR isantibody-mediated rejection (ABMR).

Embodiment 56

The method of any one of Embodiments 49-54, wherein the AR isT-cell-mediated rejection (TCMR).

Embodiment 57

The method of any one of the preceding Embodiments, wherein the methodhas a sensitivity as determined by a limit of blank (LoB) of about 0.5%or less, and a limit of detection (LoD) of about 0.5% or less.

Embodiment 58

The method of Embodiment 57, wherein the LoB is about 0.23% or less, andthe LoD is about 0.29% or less.

Embodiment 59

The method of Embodiment 57, wherein the sensitivity is furtherdetermined by a limit of quantitation (LoQ), wherein the LoQ is aboutequal to or greater than the LoD.

Embodiment 60

The method of Embodiment 59, wherein the LoB is about 0.04% or less, andthe LoD is about 0.05% or less, and the LoQ is about equal to the LoD.

Embodiment 61

The method of any of the preceding Embodiments, wherein the method hasan accuracy as determined by evaluating a linearity value obtained fromlinear regression analysis of measured donor fractions as a function ofthe corresponding attempted spike levels, wherein the linearity value isa R2 value, wherein the R2 value is from about 0.98 to about 1.0.

Embodiment 62

The method of Embodiment 61, wherein the R2 value is about 0.999

Embodiment 63

The method of any one of the preceding Embodiments, wherein the methodhas an accuracy as determined by using linear regression on measureddonor fractions as a function of the corresponding attempted spikelevels to calculate a slope value and an intercept value, wherein theslope value is from about 0.9 to about 1.2 and the intercept value isfrom about −0.0001 to about 0.01.

Embodiment 64

The method of Embodiment 63, wherein the slope value is about 1, and theintercept value is about 0.

Embodiment 65

The method of any one of the preceding Embodiments, wherein the methodhas a precision as determined by calculating a coefficient of variation(CV), wherein the CV is less than about 10.0%.

What is claimed is:
 1. A method of preparing a blood sample forassessment of organ transplant injury or failure in a transplantrecipient that has received an organ from a donor, comprising receivinga first blood sample from the transplant recipient, the first bloodsample being extracted from the transplant recipient at a first time;extracting a first portion from the first blood sample, wherein thefirst portion includes DNA comprising cell-free DNA derived from thetransplant recipient (bg-cfDNA) and DNA derived from the donor of thetransplanted organ (dd-cfDNA); and performing amplification of 50-50,000target loci in the first portion in a single reaction volume using50-50,000 primer pairs, wherein the target loci comprise polymorphicloci and non-polymorphic loci of up to 100 base pairs.
 2. The method ofclaim 1, wherein the transplanted organ is a kidney, and wherein themethod further includes extracting a second portion from the bloodsample and measuring serum creatinine level in the second portion. 3.The method of claim 2, wherein the serum creatinine level is not greaterthan 20% above a steady-state baseline level.
 4. The method of claim 3,comprising enriching the dd-cfDNA relative to bg-cfDNA in the firstextracted sample, amplifying the enriched DNA, and preferentiallyenriching specific loci in the amplified DNA.
 5. The method of claim 4,wherein the amplification occurs simultaneously for all primers of thetarget loci.
 6. The method of claim 4, wherein the target loci includegreater than 10,000 single-nucleotide polymorphisms (SNPs).
 7. Themethod of claim 4, comprising: a) receiving a second blood sample takenfrom the transplant recipient at a second time, the second time beingafter the first time. b) extracting a second portion from the secondblood sample, wherein the second portion includes DNA comprisingbg-cfDNA and dd-cfDNA; c) performing amplification of 50-50,000 targetloci in the second portion in a single reaction volume using 50-50,000primer pairs, wherein the target loci in the second portion comprisepolymorphic loci and non-polymorphic loci of up to 100 base pairs; andd) inserting the amplification products into a high-throughputsequencing machine.
 8. The method of claim 1, wherein the transplantrecipient has experienced a physical injury.
 9. The method of claim 1,further comprising: inserting the amplification products into ahigh-throughput sequencing machine.
 10. A method of treating a patientthat has received a kidney transplant from a donor, wherein the patienthas experienced organ injury, comprising receiving, from a lab, resultsof testing of a patient blood sample, the results indicative of firstand second component levels in the blood sample, wherein the firstcomponent level comprises serum creatinine at first percentage level inthe blood sample, the second component level comprises dd-cfDNA derivedfrom the kidney donor at a second percentage level, the dd-cfDNA havingbeen sampled from an extracted portion of the blood sample and enrichedrelative to background cell-free DNA of the patient; and when the secondpercentage level of dd-cfDNA exceeds 1%, or the first percentage levelof serum creatinine exceeds 20% of a steady-state baseline level,applying a course of immunosuppressant therapy to the patient during afirst time period.